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BLEOMYCIN GENE CLUSTER COMPONENTS AND THEIR USES 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims benefit under 35 U.S.C. §1 19 of provisional 
applications USSN 60/1 15,435, filed on January 6, 1999, and USSN 60/1 18,848, filed on 
5 February 5, 1999, both of which are herein incorporated by reference in their entirety for all 
purposes. 

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY 
SPONSORED RESEARCH AND DEVELOPMENT 

This work was supported in part by an Institutional Research Grant from the 
10 American Cancer Society and the School of Medicien, University of California, Davis, 

National Institutes of Health Grant Number A140475, and a grant from the Searle Scholars 
Program of the Chicago Community Trust. The Government of the United States of 
America may have certain rights in this invention. 

FIELD OF THE INVENTION 

15 This invention relates the field of polyketide synthesis and nonribosomal 

polypeptide synthesis. In particular this invention pertains to the isolation of the bleomycin 
gene cluster which encodes the first identified hybrid polyketide synthase/nonribosomal 
peptide synthetase pathway. 

BACKGROUND OF THE INVENTION 

20 Polyketides and nonribosomal peptides are two large families of natural 

products that include many clinically valuable drugs, such as erythromycin and vancomycin 
(antibacterial), FK506 and cyclosporin (immunosuppresant), and epothilone and bleomycin 
(BLM) (antitumor). The biosyntheses of polyketides and nonribosomal peptides are 
catalyzed by polyketide synthases (PKSs) (Hopwood (1997) Chem. Rev. 97: 2465; Katz 

25 (1997) Chem. Rev., 97: 2557; C. Khosla, (1997) Chem. Rev., 97: 2577; Ikeda and Omura, 
(1997) Chem. Rev., 97: 2591; Staunton and Wilkinson( 1997) Chem. Rev., 97: 2611; Cane et 
al. (1998) Science 282: 63) and nonribosomal peptide synthetases (NRPSs) (Cane et 
a/.(1998) Science 282: 63. Marahiel et al. (1997) Chem. Rev. 97: 2651; von Dohren et al. 
(1997) Chem. Rev. 97: 2675), respectively. Remarkably, PKSs and NRPSs use a very 
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similar strategy for the assembly of these two distinct classes of natural products by 
sequential condensation of short carboxylic acids and amino acids, respectively, and utilize 
the same 4'-phosphopantetheine prosthetic group, via a thioester linkage, to channel the 
growing polyketide or peptide intermediate during the elongation processes. 
5 Both type I PKSs and NRPSs are multifunctional proteins that are organized 

into modules. (A module is defined as a set of distinctive domains that encode all the 
enzyme activities necessary for one cycle of polyketide or peptide chain elongation and 
associated modifications.) The number and order of modules and the type of domains within 
a module on each PKS or NRPS protein determine the structural variations of the resulting 

10 polyketide and peptide products by dictating the number, order, choice of the carboxylic acid 
or amino acid to be incorporated, and the modifications associated with a particular cycle of 
elongation. These features of PKS and NRPS inspired us to search for a hybrid PKS and 
NRPS system. Since the modular architecture of both PKS (Cane et al. (199 8) Science 282: 
63; Katz and Danadio (1993) Ann. Rev. Microbiol. 47: 875 (1993); Hutchinson and Fujii 

15 (1995) Ann. Rev. Microbiol. 49: 201) and NRPS (Cane et <z/.(1998) Science 282: 63, 

Stachelhaus et al. (1995) Science 269: 69; Stachelhaus et al. (198) Mol. Gen. Genet. 257: 
308; Belshaw et al. (1999) Science 284, 486) has been exploited successfully in 
combinatorial biosynthesis of diverse "unnatural" natural products, it is imagined that a 
hybrid PKS and NRPS system, capable of incorporating both carboxylic acids and amino 

20 acids into the final products, could surely lead to even greater chemical structural diversity. 

The BLMs, differing structurally at the C-terminal amines of the 
glycopeptides, are a family of antibiotics produced by Streptomyces verticillus (Sv). BLMs 
exhibit strong antitumor activity through a metal-dependent oxidative cleavage of DNA or 
RNA in the presence of molecular oxygen and are incorporated into current chemotherapy of 

25 several malignancies under the trade name of Blenoxane® that contains BLM A2 and BLM 
B2 as the principal constituents (Sikic et al. Eds. (1985) Bleomycin Chemotherapy, 
Academic Press, New York; Natrajan and Hecht (1994) pages 197-242 In: Molecular 
Aspects of Anticancer Drug-DNA Interaction Vol. 2, Neidle and Waring Eds., Macmillan, 
London). Umezawa, Fujii, Takita, and co-workers extensively studied the biosynthesis of 

30 BLM in Sv ATCC 15003 by feeding isotope-labeled precursors and by isolating various 
biosynthetic intermediates and shunt metabolites, establishing that the BLMs are in fact 
natural hybrid metabolites of polyketide and peptide biosynthesis (Takita and Muroka (1990) 
pages 289-309 In: Biochemistry of Peptide Antibiotics: Recent Advances in the 
Biotechnology of p-Lactams and Microbial Peptides, Kleinkauf and Von Dohren Eds., W. de 




Gruyter, New York). On the assumption that BLM biosynthesis follows the paradigm for 
peptide and polyketide biosynthesis, we predict that the Blm megasynthetase, which 
catalyzes the assembly of the BLM backbone from nine amino acids and one acetate, should 
bear the characteristics of both NRPS and PKS, providing an excellent model to study the 
5 mechanism by which NRPS and PKS could be integrated into a productive biosynthetic 

system to synthesize a hybrid peptide and polyketide metabolite (Fig. 1 A) (Shen et al. (1999) 
Bioorg. Chem. 27: 155). 

SUMMARY OF THE INVENTION 

This invention pertains to the isolation and elucidation of the bleomycin gene 

10 cluster. Nucleic acid sequences encoding all of the open reading frames (ORFs) that encode 
polypeptides sufficient to direct the biosynthesis of bleomycin are provided. The nucleic 
acids can be used in their "native" format or recombined in a wide variety of manners to 
create novel synthetic pathways. 

In one embodiment, this invention provides an isolated nucleic acid 

15 comprising a nucleic acid selected from the group consisting of a nucleic acid encoding any 
one of Blm open reading frames (ORFs) 8 through 41, and/or a nucleic acid encoding a 
polypeptide encoded by any one of Blm open reading frames (ORFs) 8 through 41, and/or a 
nucleic acid amplified by polymerase chain reaction (PCR) using any one of the primer pairs 
identified in Table II and the nucleic acid of a bleomycin-producing organism as a template. 

20 The nucleic acid may comprise one or multiple (e.g. two, more preferably 3 or more) 

bleomycin open reading frames (i.e. BLM ORFs 8 through 41). One preferred nucleic acid 
comprises a nucleic acid encoding a C domain lacking one or more His residues of the 
conserved HHxxxDG active site for transpeptidation. In another preferred embodiment the 
nucleic acid comprises a nucleic acid encoding a protein encoded by a gene selected from the 

25 group consisting of blml, blmll, and blmXI. 

In another embodiment this invention provides an isolated nucleic acid 
encoding a (biosynthetic) module comprising two or more (more preferably three or more, 
most preferably four or more) catalytic domains of a protein encoded by a nucleic acid of a 
bleomycin gene cluster wherein said catalytic domains are selected from the group consisting 

30 of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) 
domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, 
an oxidization domain (Ox), a ketoacyl synthase (KS) domain , an acetyl transferase (AT) 
domain, a ketoreductase (KR) domain, and a methyltransferase (MT) domain. Preferred 




nucleic acids comprises a nucleic acid encoding one or more proteins comprising a module 
selected from the group consisting of NRPS-0, NRPS-1, NRPS-2, NRPS-3, NRPS-4, NRPS- 
5, NRPS-6, NRPS-7, NRPS-7, NRPS-9, and PKS. Particularly preferred nucleic acids 
comprise an open reading frame from SEQ ID NO: 1, SEQ ID NO: 2, or SEQ ID NO: 3. 
5 In still another embodiment, this invention provides an isolated nucleic acid 

comprising a nucleic acid encoding a protein encoded by a gene from a BLM gene cluster. 
Preferred nucleic acids encode a protein encoded by a gene selected from the group 
consisting of blml, blmll, and blmXI. In another embodiment, preferred nucleic acids 
encode a protein encoded by a gene selected from the group consisting of blmlll, blmlV, 
10 blmV, blmVI, blmVII, blmlX, and blmX. In still yet another embodiment, the nucleic acid 
comprises a nucleic acid encoding a protein encoded by blmVIII. Particularly preferred 
nucleic acids comprise a nucleic acid selected from the group consisting of blml, blmll, and 
blmXI. Other particularly preferred nucleic acids comprise a nucleic acid selected from the 
group consisting of blmlll, blmlV, blmV, blmVI, blmVII, blmlX, and blmX, while still other 
1 5 particularly preferred nucleic acids comprise blmVIII. 

In still yet another embodiment, this invention provides an isolated nucleic 
acid comprising a nucleic acid that encodes a protein comprising at least one catalytic 
domain selected from the group consisting of a condensation (C) domain, an adenylation (A) 
domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an 
20 acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), a ketoacyl synthase 
(KS) domain , an acetyl transferase (AT) domain, a ketoreductase (KR) domain, and a 
methyltransferase (MT) domain, and that hybridizes to a nucleic acid selected from the group 
consisting of orf8, orf9, orflO, orfll, orfl2, orfl3, orfl4, orfl5, orfl5, orfl6, orfl7, orfl8, 
orfl9, orf20, orf21, orf22, orf23, orf24, orf25, orf26, orf27, orf28, orf29, orf30, orfil, orf32, 
25 orf33, orf34, orf35, orf36, orf37, orf38, orf39, and orf40 under stringent conditions. In 
certain embodiments this also includes nucleic acids that would stringently hybridizes 
indicated above, but for, the degeneracy of the nucleic acid code. In other words, if silent 
mutations could be made in the subject sequence so that it hybridizes to he indicated 
sequence(s) under stringent conditions, it would be included in certain embodiments. A 
30 preferred isolated nucleic acid comprises a nucleic acid encoding a module. A particularly 
preferred isolated nucleic acid comprises a nucleic acid encoding a BLM gene. 

This invention also provides a nucleic acid comprising a nucleic acid selected 
from the group consisting of consisting of orf8, orf9, orflO, orfl 1, orfl2, orfl3, orfl4, orfl5, 
orfl5, orf!6, orfl7, orfl8, orfl9, orf20, orf21, orf22, orf23, orf24, orf25, orf26, orf27, orf28, 



orf29, orf30, orBl, orf32, orG3, orf34, orf35, orB6, or07, orf38, orf39, and orf40, or an 
allelic variant thereof. Preferred nucleic acids comprise a nucleic acid that is a single 
nucleotide polymorphism (SNP) of a nucleic acid selected from the group consisting of 
consisting of orf8, orf9, orflO, orfll, orfl2, orfl3, orfl4, orfl5, orfl5, orfl6, orfl7, orfl8, 
5 orfl9, orf20, orf21, orf22, orf23, orf24, orf25, orf26, orf27, orf28, or£29, orf30, orf31, orf32, 
orf33, orf34, orf35, orf36, orf37, orf38, orf39, and orf40. 

This invention also provides an isolated gene cluster comprising open reading 
frames encoding polypeptides sufficient to direct the assembly of a bleomycin. 

In one embodiment this invention provides an isolated multi-functional 

1 0 protein complex comprising both a polyketide synthase (PKS) and a polypeptide synthetase 
(NRPS) and/or an isolated nucleic acid encoding a multi-functional protein complex 
comprising both a polyketide synthase (PKS) and a polypeptide synthetase (NRPS). 

This invention also provides various blm cluster polypeptides or blm cluster- 
derived polypeptides. Thus, in one embodiment this invention provides an isolated 

1 5 polypeptide comprising a catalytic domain encoded by a nucleic acid of a bleomycin gene 
cluster wherein said nucleic acid comprises a nucleic acid selected from the group consisting 
of a nucleic acid encoding any one of Blm open reading frames (ORFs) 8 through 41; and/or 
a nucleic acid amplified by polymerase chain reaction (PCR) using any one of the primer 
pairs identified in Table II. Preferred polypeptides comprise an enzymatic domain selected 

20 from the group consisting of a condensation (C) domain, an adenylation (A) domain, a 
peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an acyl- 
carrier protein (ACP)-like domain, an oxidization domain (Ox), a ketoacyl synthase (KS) 
domain , an acetyl transferase (AT) domain, a ketoreductase (KR) domain, and a 
methyltransferase (MT) domain. Particularly preferred polypeptides are encoded by the 

25 nucleic acids described above and herein. 

This invention also provides expression vectors comprising any of the nucleic 
acids described herein and/or host cells {e.g. Streptomyces) transfected and/or transformed 
with any of these expression vectors. A preferred host cell is transformed with an exogenous 
nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct the 

30 assembly of a bleomycin or bleomycin analog. 

This invention also provides methods of use of the blm and Wra-derived 
nucleic acid(s) and/or polypeptides. One such method is a method of chemically modifying 
a biological molecule. The method involves contacting a biological molecule that is a 
substrate for a polypeptide encoded by one or more bleomycin biosynthesis gene cluster 




open reading frames with the polypeptide encoded by one or more bleomycin biosynthesis 
gene cluster open reading frames, whereby the polypeptide chemically modifies the 
biological molecule. In one particularly preferred embodiment, the biological molecule is an 
amino acid and said polypeptide is a peptide synthetase. In another preferred embodiment, 
5 the polypeptide is a methyl transferase. Other substrates and blm encoded polypeptides are 
illustrated in Table II. 

In another embodiment this invention provides a method of coupling a first 
amino acid to a second amino acid. This method involves contacting the first and second 
amino acid with a recombinantly expressed bleomycin nonribosomal peptide synthetase 

10 (NRPS). A preferred NRPS is selected from the group consisting of NRPS-5, NRPS-4, 
NRPS-3, NRPS-9, NRPS-8, and NRPS-7. Another preferred NRPS is selected from the 
group consisting of NRPS-6, NRPS-2, NRPS-1, and NRPS-0. The contacting can be in vivo 
(e.g. in a host cell) or ex vivo. 

In another embodiment this invention provides a methods of coupling a first 

15 fatty acid to a second fatty acid, said method comprising contacting the first and second fatty 
acids with a recombinantly expressed bleomycin polyketide synthase (PKS). Again, the 
contacting can be in vivo {e.g. in a host cell) or ex vivo. 

In still another embodiment, this invention provides a method of producing a 
bleomycin or bleomycin analog. The method involves providing a cell transformed with an 

20 exogenous nucleic acid comprising a bleomycin gene cluster encoding polypeptides 

sufficient to direct the assembly of said bleomycin or bleomycin analog; culturing the cell 
under conditions permitting the biosynthesis of bleomycin or bleomycin analog; and 
isolating said bleomycin or bleomycin analog from said cell. 

This invention also provides an isolated nucleic acid comprising a nucleic 

25 acid encoding a phosphopantetheinyl transferase said nucleic acid encoding a 

phosphopantetheinyl transferase being selected from the group consisting of: a nucleic acid 
encoding the protein encoded by the nucleic acid of SEQ ID NO:3; a nucleic acid amplified 
by polymerase chain reaction (PCR) using primers that specifically amplify ORF 41 
(primers: SEQ ID NO:71 and SEQ ID NO:72) and Streptomyces nucleic acid as a template; a 

30 nucleic acid encoding a polypeptide having phosphopantetheinyl transferase activity where 
said nucleic acid specifically hybridizes to the nucleic acid of SEQ ID NO: 3 under stringent 
conditions. In one embodiment, the nucleic acid comprises the nucleic acid of SEQ ID 
NO:3. 
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In another embodiment, this invention provides a polypeptide comprising a 
phosphopantetheinyl transferase encoded by SEQ ID NO: 3 or a polypeptide having 
phosphopantetheinyl transferase activity and the sequence encoded by the nucleic acid of 
SEQ ID NO: 3 or conservative substitutions of that polypeptide. 
5 Also provided are vectors comprising a nucleic acid encoding a 

phosphopantetheinyl transferase (e.g., as described above) and cells transfected with the 
vector. 

This invention also provides a method of converting an apo carrier protein to 
a holo carrier protein, said method comprising reacting said apo-carrier protein with a 
10 recombinant phosphopantetheinyl transferase encoded by SEQ ID NO: 3 and coenzyme A 
thereby producing a holo-carrier protein. 

In certain embodiments, this invention specifically excludes one or more of 
open reading frames 1 through 41. In particularly preferred embodiments, this invention 
excludes open reading frames 1 through 7 (Orf 1- Orf 7). 

15 DEFINITIONS 

The "polyketide synthases" (PKSs) refers are multifunctional enzymes, 
related to fatty acid synthases (FASs). PKSs catalyze the biosynthesis of polyketides through 
repeated (decarboxylative) Claisen condensations between acylthioesters, usually acetyl, 
propionyl, malonyl or methylmalonyl. Following each condensation, they typically 

20 introduce structural variability into the product by catalyzing all, part, or none of a reductive 
cycle comprising a ketoreduction, dehydration, and enoylreduction on the p-keto group of 
the growing polyketide chain. PKSs incorporate enormous structural diversity into their 
products, in addition to varying the condensation cycle, by controlling the overall chain 
length, choice of primer and extender units and, particularly in the case of aromatic 

25 polyketides, regiospecific cyclizations of the nascent polyketide chain. After the carbon 
chain has grown to a length characteristic of each specific product, it is typically released 
from the synthase by thiolysis or acyltransfer. Thus, PKSs consist of families of enzymes 
which work together to produce a given polyketide. Two general classes of PKSs exist. One 
class, known as Type I PKSs, is represented by the PKSs for macrolides such as 

30 erythromycin. These "complex" or "modular" PKSs include assemblies of several large 

multifunctional proteins carrying, between them, a set of separate active sites for each step of 
carbon chain assembly and modification (Cortes et al. (1990) Nature 348: 176; Donadio et 
al. (1991) Science 252: 675; MacNeil et al. (1992) Gene 1 15: 1 19). Structural diversity 



occurs in this class from variations in the number and type of active sites in the PKSs. This 
class of PKSs displays a one-to-one correlation between the number and clustering of active 
sites in the primary sequence of the PKS and the structure of the polyketide backbone. The 
second class of PKSs, called Type II PKSs, is represented by the synthases for aromatic 
5 compounds. Type II PKSs typically have a single set of iteratively used active sites (Bibb et 
al (1989) EMBOJ. 8: 2727; Sherman et al (l9S9(EMBOJ. 8: 2717; Fernandez-Moreno, 
etal. (1992)/. Biol Chem. 267:19278). 

A "nonribosomal peptide synthase" (NRPS) refers to an enzymatic complex 
of eucaryotic or procaryotic origin, that is responsible for the synthesis of peptides by a 

10 nonribosomal mechanism, often known as thiotemplate synthesis (Kleinkauf and von 

Doehren (1987) Ann. Rev. Microbiol, 41: 259-289). Such peptides, which can be up to 20 or 
more amino acids in length, can have a linear, cyclic (cyclosporine, tyrocidine, 
mycobacilline, surfactin and others) or branched cyclic structure (polymyxin, bacitracin and 
others) and often contain amino acids not present in proteins or modified amino acids 

1 5 through methylation or epimerization. 

A "module" refers to a set of distinctive polypeptide domains that encode all 
the enzyme activities necessary for one cycle of polyketide or peptide chain elongation and 
associated modifications. 

The terms "isolated" "purified" or "biologically pure" refer to material which 

20 is substantially or essentially free from components which normally accompany it as found 
in its native state. With respect to nucleic acids and/or polypeptides the term can refer to 
nucleic acids or polypeptides that are no longer flanked by the sequences typically flanking 
them in nature. 

The terms "polypeptide", "peptide" and "protein" are used interchangeably 
25 herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers 
in which one or more amino acid residue is an artificial chemical analogue of a 
corresponding naturally occurring amino acid, as well as to naturally occurring amino acid 
polymers. The term also includes variants on the traditional peptide linkage joining the 
amino acids making up the polypeptide. 
30 The terms "nucleic acid" or "oligonucleotide" or grammatical equivalents 

herein refer to at least two nucleotides covalently linked together. A nucleic acid of the 
present invention is preferably single-stranded or double stranded and will generally contain 
phosphodiester bonds, although in some cases, as outlined below, nucleic acid analogs are 
included that may have alternate backbones, comprising, for example, phosphoramide 



(Beaucage et al. (1993) Tetrahedron 49(10): 1925) and references therein; Letsinger (1970) 
J. Org. Chem. 35:3800; Sprinzl et al. (1977) Eur. J. Biochem. 81: 579; Letsinger et al. (1986) 
Nucl. Acids Res. 14: 3487; Sawai et al. (1984) Chem. Lett. 805, Letsinger et al (1988) /. Am. 
Chem. Soc. 110: 4470; and Pauwels et al. (1986) Chemica Scripta 26: 1419), 
5 phosphorothioate (Mag et al. (1 99 1) Nucleic Acids Res. 19: 1437; and U.S. Patent No. 
5 ; 644 5 048), phosphorodithioate (Briu et al. (1989) J. Am. Chem. Soc. 1 1 1 :2321, 0- 
methylphophoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A 
Practical Approach, Oxford University Press), and peptide nucleic acid backbones and 
linkages (see Egholm (1992) J. Am. Chem. Soc. 114:1895; Meier et al. (1992) Chem. Int. Ed. 
10 Engl. 31: 1008; Nielsen (1993) Nature, 365: 566; Carlsson et al. (1996) Nature 380: 207). 
Other analog nucleic acids include those with positive backbones (Denpcy et al. (1995) 
Proc. Natl. Acad. Set USA 92: 6097; non-ionic backbones (U.S. Patent Nos. 5,386,023, 
5,637,684, 5,602,240, 5,216,141 and 4,469,863; Angew. (1991) Chem. Intl. Ed. English 30: 
423; Letsinger et al. (1988) J. Am. Chem. Soc. 1 10:4470; Letsinger et al. (1994) Nucleoside 
15 & Nucleotide 13:1597; Chapters 2 and 3, ASC Symposium Series 580, "Carbohydrate 

Modifications in Antisense Research", Ed. Y.S. Sanghui and P. Dan Cook; Mesmaeker et al. 
(1994), Bioorganic & Medicinal Chem. Lett. 4: 395; Jeffs et al. (1994) J. Biomolecular NMR 
34:17; Tetrahedron Lett. 31:143 (1996)) and non-ribose backbones, including those 
described in U.S. Patent Nos. 5,235,033 and 5,034,506, and Chapters 6 and 7, ASC 
20 Symposium Series 580, Carbohydrate Modifications in Antisense Research, Ed. Y.S. 

Sanghui and P. Dan Cook. Nucleic acids containing one or more carbocyclic sugars are also 
included within the definition of nucleic acids (see Jenkins et al. (1995), Chem. Soc. Rev. 
ppl 69-176). Several nucleic acid analogs are described in Rawls, C & E News June 2, 1997 
page 35. These modifications of the ribose-phosphate backbone may be done to facilitate the 
25 addition of additional moieties such as labels, or to increase the stability and half-life of such 
molecules in physiological environments. 

The term "heterologous" as it relates to nucleic acid sequences such as coding 
sequences and control sequences, denotes sequences that are not normally associated with a 
region of a recombinant construct, and/or are not normally associated with a particular cell. 
30 Thus, a "heterologous" region of a nucleic acid construct is an identifiable segment of 
nucleic acid within or attached to another nucleic acid molecule that is not found in 
association with the other molecule in nature. For example, a heterologous region of a 
construct could include a coding sequence flanked by sequences not found in association 
with the coding sequence in nature. Another example of a heterologous coding sequence is a 




construct where the coding sequence itself is not found in nature (e.g., synthetic sequences 
having codons different from the native gene). Similarly, a host cell transformed with a 
construct which is not normally present in the host cell would be considered heterologous for 
purposes of this invention. 
5 A "coding sequence" or a sequence which "encodes" a particular polypeptide 

(e.g. a PKS, an NRPS, etc.), is a nucleic acid sequence which is ultimately transcribed and/or 
translated into that polypeptide in vitro and/or in vivo when placed under the control of 
appropriate regulatory sequences. In certain embodiments, the boundaries of the coding 
sequence are determined by a start codon at the 5' (amino) terminus and a translation stop 

10 codon at the 3' (carboxy) terminus. A coding sequence can include, but is not limited to, 

cDNA from procaryotic or eucaryotic mRNA, genomic DNA sequences from procaryotic or 
eucaryotic DNA, and even synthetic DNA sequences. In preferred embodiments, a 
transcription termination sequence will usually be located 3' to the coding sequence. 

Expression "control sequences" refers collectively to promoter sequences, 

15 ribosome binding sites, polyadenylation signals, transcription termination sequences, 

upstream regulatory domains, enhancers, and the like, which collectively provide for the 
transcription and translation of a coding sequence in a host cell. Not all of these control 
sequences need always be present in a recombinant vector so long as the desired gene is 
capable of being transcribed and translated. 

20 "Recombination" refers to the reassortment of sections of DNA or RNA 

sequences between two DNA or RNA molecules. "Homologous recombination" occurs 
between two DNA molecules which hybridize by virtue of homologous or complementary 
nucleotide sequences present in each DNA molecule. 

The terms "stringent conditions" or "hybridization under stringent conditions" 

25 refers to conditions under which a probe will hybridize preferentially to its target 
subsequence, and to a lesser extent to, or not at all to, other sequences. "Stringent 
hybridization" and "stringent hybridization wash conditions" in the context of nucleic acid 
hybridization experiments such as Southern and northern hybridizations are sequence 
dependent, and are different under different environmental parameters. An extensive guide 

30 to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in 

Biochemistry and Molecular Biology—Hybridization with Nucleic Acid Probes part I chapter 
2 Overview of principles of hybridization and the strategy of nucleic acid probe assays, 
Elsevier, New York. Generally, highly stringent hybridization and wash conditions are 
selected to be about 5°C lower than the thermal melting point (T m ) for the specific sequence 




at a defined ionic strength and pH. The T m is the temperature (under defined ionic strength 
and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very 
stringent conditions are selected to be equal to the T m for a particular probe. 

An example of stringent hybridization conditions for hybridization of 
5 complementary nucleic acids which have more than 100 complementary residues on a filter 
in a Southern or northern blot is 50% formamide with 1 mg of heparin at 42°C, with the 
hybridization being carried out overnight. An example of highly stringent wash conditions is 
0.15 M NaCl at 72°C for about 15 minutes. An example of stringent wash conditions is a 
0.2x SSC wash at 65 °C for 15 minutes (see, Sambrook et al. (1989) Molecular Cloning - A 
1 0 Laboratory Manual (2nd ed.) Vol. 1 -3 , Cold Spring Harbor Laboratory, Cold Spring Harbor 
Press, NY, for a description of SSC buffer). Often, a high stringency wash is preceded by a 
low stringency wash to remove background probe signal. An example medium stringency 
wash for a duplex of, e.g., more than 100 nucleotides, is lx SSC at 45°C for 15 minutes. An 
example low stringency wash for a duplex of, e.g., more than 100 nucleotides, is 4-6x SSC at 
15 40°C for 15 minutes. In general, a signal to noise ratio of 2x (or higher) than that observed 
for an unrelated probe in the particular hybridization assay indicates detection of a specific 
hybridization. Nucleic acids which do not hybridize to each other under stringent conditions 
are still substantially identical if the polypeptides which they encode are substantially 
identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum 
20 codon degeneracy permitted by the genetic code. 

A "library" or "combinatorial library" of polyketides and/or polypeptides is 
intended to mean a collection of polyketides and/or polypeptides (or other molecules) 
catalytically produced by a PKS and/or NRPS and/or hybrid PKS/NRPS (or other possible 
combination of synthetic elements) gene cluster. The library can be produced by a gene 
25 cluster that contains any combination of native, homolog or mutant genes from aromatic, 
modular or fungal PKSs and/or NRPSs. The combination of genes can be derived from a 
single PKS and/or NRPS gene cluster, e.g., act.fr en, gra, tern, whiE, gris, ery, or the like, 
and may optionally include genes encoding tailoring enzymes which are capable of 
catalyzing the further modification of a polypeptide, polyketide, or other molecule. 
30 Alternatively, the combination of genes can be rationally or stochastically derived from an 
assortment of NRPS and/or PKS gene clusters. The library of polyketides and/or 
polypeptides and/or other molecules thus produced can be tested or screened for biological, 
pharmacological or other activity. 
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By "random assortment" is intended any combination and/or order of genes, 
homologs or mutants which encode for the various PKS and/or NRPS enzymes, modules, 
active sites or portions thereof derived from aromatic, modular or fungal PKS and/or NRPS 
gene clusters. 

5 By "genetically engineered host cell" is meant a host cell where the native 

PKS and/or NRPS gene cluster has been altered or deleted using recombinant DNA 
techniques or a host cell into which a heterologous PKS and/or NRPS and/or hybrid 
PKS/NRPS gene cluster has been inserted. Thus, the term would not encompass mutational 
events occurring in nature. A "host cell" is a cell derived from a procaryotic microorganism 

10 or a eucaryotic cell line cultured as a unicellular entity, which can be, or has been, used as a 
recipient for recombinant vectors bearing the PKS, NRPS, and/or hybrid gene clusters of the 
invention. The term includes the progeny of the original cell which has been transfected. It 
is understood that the progeny of a single parental cell may not necessarily be completely 
identical in morphology or in genomic or total DNA complement to the original parent, due 

1 5 to accidental or deliberate mutation. Progeny of the parental cell which are sufficiently 
similar to the parent to be characterized by the relevant property, such as the presence of a 
nucleotide sequence encoding a desired PKS, are included in the definition, and are covered 
by the above terms. 

Expression vectors are defined herein as nucleic acid sequences that are direct 
20 the transcription of cloned copies of genes/cDNAs and/or the translation of their mRNAs in 
an appropriate host. Such vectors can be used to express genes or cDNAs in a variety of 
hosts such as bacteria, bluegreen algae, plant cells, insect cells and animal cells. Expression 
vectors include, but are not limited to, cloning vectors, modified cloning vectors, specifically 
designed plasmids or viruses. Specifically designed vectors allow the shuttling of DNA 
25 between hosts, such as bacteria-yeast or bacteria-animal cells. An appropriately constructed 
expression vector preferably contains: an origin of replication for autonomous replication in 
a host cell, a selectable marker, optionally one or more restriction enzyme sites, optionally 
one or more constitutive or inducible promoters. In preferred embodiments, an expression 
vector is a replicable DNA construct in which a DNA sequence encoding a one or more PKS 
30 and/or NRPS domains and/or modules is operably linked to suitable control sequences 

capable of effecting the expression of the products of these synthase and/or synthetases in a 
suitable host. Control sequences include a transcriptional promoter, an optional operator 
sequence to control transcription and sequences which control the termination of 
transcription and translation, and so forth. 



A "bleomycin open reading frame", or "bleomycin ORF", or "BLM Orf" 
refers to a nucleic acid open reading frame that encodes a polypeptide or polypeptide domain 
that has an enzymatic activity used in the biosynthesis of a bleomycin. 

A "PKS/NRPS/PKS" system refers to a synthetic system comprising anNRPS 
5 flanked by two PKSs. A "NRPS/PKS/NRPS" system refers to a synthetic system comprising 
a PKS flanked by two NRPSs. A "hybrid PKS/NRPS system" or a "hybrid NRPS/PKS 
system" refers to a hybrid synthetic system comprising at least one PKS and one NRPS 
module. The system can comprise multiple modules and the order can vary. 

A "biological molecule that is a substrate for a polypeptide encoded by a 
1 0 bleomycin biosynthesis gene" refers to a molecule that is chemically modified by one or 
more polypeptides enccoded by open reading frame(s) of the blm gene cluster. The 
"substrate" may be a native molecule that typically participates in the biosynthesis of a 
bleomycin, or can be any other molecule that can be similarly acted upon by the polypeptide. 

A "polymorphism" is a variation in the DNA sequence of some members of a 
15 species. A polymorphism is thus said to be "allelic," in that, due to the existence of the 
polymorphism, some members of a species may have the unmutated sequence {i.e. the 
original "allele") whereas other members may have a mutated sequence (i.e. the variant or 
mutant "allele"). In the simplest case, only one mutated sequence may exist, and the 
polymorphism is said to be diallelic. In the case of diallelic diploid organisms, three 
20 genotypes are possible. They can be homozygous for one allele, homozygous for the other 
allele or heterozygous. In the case of diallelic haploid organisms, they can have one allele or 
the other, thus only two genotypes are possible. The occurrence of alternative mutations can 
give rise to trialleleic, etc. polymorphisms. An allele may be referred to by the nucleotide(s) 
that comprise the mutation. 
25 "Single nucleotide polymorphism" or "SNPs are defined by their 

characteristic attributes. A central attribute of such a polymorphism is that it contains a 
polymorphic site, "X," most preferably occupied by a single nucleotide, which is the site of 
the polymorphism's variation (Goelet and Knapp U.S. patent application Ser. No. 
08/145,145). Methods of identifying SNPs are well known to those of skill in the art (see, 
30 e.g., U.S. Patent 5,952,174). 

The following abbreviations are used herein:: A, adenylation; ACP, acyl 
carrier protein; AT, acyltransferase; BLM, bleomycin; C, condensation; Cy, 
condensation/cyclization; KR, ketoreductase; KS, ketoacyl synthase; MT, methyltransferase; 
NRPS, nonribosomal peptide synthetase; orf, open reading frame; Ox, oxidation; PCP, 




peptidyl carrier protein; PCR, polymerase chain reaction; PKS, polyketide synthase; Sv, 
Streptomyces verticillus, ArCP, aryl carrier protein, bp, base pair, CoA, co-enzyme A, DTT, 
dithiothreitol; FAS, fatty acid synthase; kb, kilobase; PPTase, 4'-phosphopantethemyl 
transferase; TCA, trichloroacetic acid; and DEBS, 6-deoxyerythronolide B synthase.. 

5 BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1A and IB illustrate the biosynthetic pathway for bleomycin in Sv 
(ATCC 15003). Figure 1 A illustrates a biosynthetic pathway for BLM in Sv ATCC 15003- 
intermediates except those in brackets were identified. Figure IB shows a linear model for 
the Blm megasynthetase-templated assembly of the BLM peptide/polyketide/peptide 

10 aglycone from nine amino acids and one acetate-shaded circles represent atypical domains 
carrying out the proposed novel chemistry, and arrows with broken line indicate where 
biosynthetic intermediates were derailed. Three-letter amino acid designations were used. 
[HO], hydroxylation; [H], reduction. 

Figure 2 provides a restriction map and gene organization of the blm gene 

15 cluster from Sv ATCC 15003 (B, BamRl). Proposed functions for individual open reading 
frames are summarized in Tables I and II. Modules for individual NRPS and PKS were 
given along with their proposed substrates in parentheses. 

Figures 3A, 3B, 3C, and 3D illustrate the determination of substrate 
specificity for NRPS-1 and NRPS-6. Figure 3 A shows a comparison of the A3 to A6 region 

20 of A domains to 84 NRPS modules available at GenBank that activate various amino acids. 
Figure 3B shows a comparison of amino acid residues that putatively line the substrate 
binding pockets for A domains (single-letter amino acid designations were used). The 
number following the protein name indicates the order of a particular A domain in the 
multimodularNRPS protein. The protein accession numbers are P48663 (HMWP2), P19828 

25 (AngR), AAC06346 (BacA-2), CAB03756 (MbtB), 3510629 (SyrE-7), 3 1 14612 (AcmB-1), 
CAA67248 (SnbC-1), and 3560507 (FxbC-2). Dhb stands for 2,3-dehydroaminobutyric 
acid. It is not known if Dhb is the direct substrate for SyrE-7 or resulted from dehydration of 
an SyrE-7 activated Thr (Guenzi et al. (1998) J. Biol. Chem. 273: 32857-32863). Figure 3C 
illustrates purified proteins after overexpression in E. coli as analyzed by electrophoresis on 

30 a 1 0% SDS-polyacrylamide gel (the calculated molecular weights for NRPS- 1 A and NRPS- 
6A are 64,212 and 61,899, respectively). Figure 3D illustrates substrate specificities as 
determined by the ATP-PPi exchange reaction with the amino acids of BLM as substrates 
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(100% relative activity corresponds to 103,000 cpm forNRPS-lA and 256,000 cpm for 
NRPS-6A). 

Figure 4 illustrates a three-module NRPS/PKS/NRPS model for channeling 
the growing intermediate between NRPS and PKS modules and between PKS and NRPS 
5 modules. The KS, ACP, and C domains are shaded to emphasize their unique activities that 
are responsible for elongating a growing peptide with a short carboxylic acid and a growing 
polyketide with an amino acid in hybrid peptide/polyketide/peptide biosynthesis. 

Figure 5 illustrates the use of &taF///methyltransferase domain to introduce 
branched methyl groups in a polyketide synthesis. PCK12 has been described by Kao et al. 
10 (1995) J. Am. Chem. Soc, 7: 9105-9106. DE-1, DE-2 and DE-3 rae three representative 
products demonstrating the strategy and utility of blmVIII in introducing a CH 3 group in 
polyketide biosynthesis. 

Figure 6 illustrates the use of the blm NRPS and PKS enzymes to synthesize a 
variety of hybrid polyketide/peptide molecules including, but not limited to, a family of 
15 oxazolines/oxazoles, and thiazoline/thiazoles. 

Figure 7 illustrates the use of elements of the blm gene cluster to synthesize 

various sugars. 

Figure 8Ashows a restriction map of the blm gene cluster from Sv 
ATCC 15003 (B, BamHl). 8B shows the relative position of the blml, blmll, and blmXI 

20 genes to the two blmAB resistance genes (blm R , Blm resistance). Individual open reading 
frames are represented by open arrows. Figure 8C shows the nucleotide sequence of the 
blml gene. The potential ribosome-binding site (RBS) and the conserved motif for 4'- 
phosphopantetheinylation are underlined. The sequence has been deposited into GenBank 
under accession no. . 

25 Figure 9 shows an amino acid sequence comparison of Blml with PCP 

domains of known type I NRPSs (Grs-2 [P14688], 36% identity, 58% similarity; Srfa-3 
[Q08787], 40% identity, 64% similarity; Vir-s [Y11547], 36% identity, 60% similarity; Saf- 
b [U24657], 40% identity, 54% similarity). Given in brackets are nucleotide sequence 
accession numbers. The shaded letters indicate similar amino acids. Consensus residues are 

30 amino acids that are similar in more than three sequences. The signature motif for 4'- 
phosphopantetheinylation is underlined. 

Figures 10A and 10B shows the HPLC analysis of Blml purified from E. coli 
OG7001(pBS2) (Fig. 10A), and£. coli OG7001(pBS2/pDPT-Gsp) (Fig. 10B). 
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Figure 1 1 shows the enzyme architecture of type I and type II PKS and 
NRPS. A, adenylation domain; ACP, acyl carrier protein or ACP domain; AT, acyl 
transferase; C, condensation protein or C domain; KS, |3-ketoacyl synthase domain; KSa, (3- 
ketoacyl synthase a subunit; KSp, p-ketoacyl synthase (3 subunit; PCP, peptidyl carrier 
5 protein or PCP domain. 

Figure 12 illustrates the reaction catalyzed by phosphopantetheinyl 
transferases (PPTases). 

Figure 13 shows a restriction map and gene organization of the pptA locus 
from SvATCC 15003 

10 DETAILED DESCRIPTION 



manner by repetitive addition of an extending unit to a growing chain by polyketide 
synthases (PKS) and nonribosomal peptide synthetase (NRPS) respectively. In the case of 
polyketides, the extending unit is typically a fatty acid (activated as an acyl CoA thioester) 

15 while the extending unit for polypeptides is typically an amino acid (activated as an 
aminonacyl adenylate). Both the PKS and NRPS systems have evolved a modular 
organization to define the number, sequence, and specificity of the incorporation of the 
extending unit and utilized the 4'-phosphopanththeine prosthetic group to channel the 
growing intermediate during the elongation process. 

20 This invention pertains to the discovery that a PKS-bound growing polyketide 

intermediate could be further elongated by an NRPS module, or conversely, a NRPS-bound 
growing polypeptide intermediate can be further elongated by a PKS module. This 
discovery permits the exploitation of NPRS, PKS, and hybrid NRPS/PKS systems to provide 
a number of novel hybrid peptide/polyketide metabolites from amino acids and short fatty 



system is exemplified by the bleomycin (Blm) biosynthesis pathway in Streptomyces 
verticillus (Sv.) (ATCC 15003). The bleomycins are a family of glycopeptide-derived 
antibiotics originally isolated by Umezawa in 1996 from the fermentation broth of S. 
30 verticillus. Bleomycins (BLMs) exhibit strong anti-tumor activity are currently used in the 
treatment of lymphoma, particularly Hodgkin's disease, testicular tumors, squamous cell 
carcinomas of skin, head, cervix, penis, rectum, and for intracavitary therapy of malignant 
effusions in ovarian and breast cancer. The commercial product, Blenoxane®, contains 



Polyketides and polypeptides can be assembled in a remarkably similar 



25 



acids. 



It was also a discovery of this invention that this hybrid NRPS/PKS/NRPS 
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BLM A2 and B2 as the principle constituents. Almost uniquely among anticancer drugs, 
BLM does not cause myelosuppression, promoting its wide application in combination 
chemotherapy. 

In one aspect, this invention provides a cloned and characterized BLM gene 
5 cluster consisting of characteristic NRPS and PKS genes from the Blm producer 

Streptoveticillum sp. (ATCC 15003). The cloned and isolated Blm gene cluster provides a 
method of recombinantly expressing bleomycin and/or bleomycin analogues. Thus, in one 
embodiment, this invention provides for nucleic acids encoding bleomycin synthetic 
machinery or subunits thereof, for cells recombinantly modified to express a bleomycin 

1 0 and/or bleomycin analogue, and for a bleomycin or bleomycinh analogue recombinantly 
expressed in such cells. 

Like other polyketide synthase or nonribosomal peptide synthetases, the 
bleomycin synthetic pathway is organized into modules, each module catalyzing the addition 
and/or modification of one subunit (e.g. fatty acid or amino acid). Each module is organized 

15 into a number of domains each domain having a characteristic activity (e.g. activation, 

condensation, condensation/cyclization, etc.). The catalytic domains within a module and 
the modules themselves are often arranged collinearly and the order of biosynthetic modules 
from NH 2 - to COOH-terminus on each PKS and NRPS polypeptide and the number and type 
of catalytic domains within each determine the order of structural and functional elements in 

20 the resulting product. The size and complexity of the ultimately formed product are 

controlled by the number of repeated acyl chain extension steps that are, in turn, a function 
of the number and placement of carrier protein domains in these multimodular enzymes. 
The number composition and order of such domains can be altered either to introduce 
modifications, e.g. into the bleomycin to produce bleomycin analogues, or to produce 

25 different or completely new molecules. Such "recombination" is not restricted solely to 

recombination among the bleomycin catalytic domains and/or modules, but can also involve 
recombination between beomycin modules and/or subunits and other PKS and/or NRPS 
modules and/or subunit. Moreover the discovery that synthetic pathways can incorporate 
both PKS and NRPS modules and/or catalytic domains makes available hybrid PKS/NRPS 

30 syntheses. 

Thus, in one embodiment this invention contemplates the use of blm gene 
cluster modules and/or catalytic domains to make various peptide and/or polyketide, and/or 
hybrid polypeptide/polyketide metabolites (including, but not limited to bleomycin 
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intermediates or shunt metabolites), in combinatorial biosynthesis with other polyketide 
synthases and/or other nonribosomal peptide synthetases. 

The blm gene cluster contains several glycosylases which can be used alone 
or in context with other PKS and/or NRPS modules or catalytic domains to make various 
5 metabolites with sugars associated with bleomycins (bleomycin sugars). 

In addition, the blm gene cluster includes a novel methyltransferase domain 
that can be used to make polyketide metabolites with methyl branch(s). 

The blm gene cluster also is characterized by the unusual Cy domains as well 
as the unprecedented Ox domain (see, e.g. BlmlV and Blmlll NRPSs), providing an efficient 
10 biosynthesis for a bithiazole structure. The blm gene cluster, blm modules, or blm catalytic 
domains can be used either individually or collectively (alone or in combinations with other 
nonribosomal peptide synthetases or polyketide synthases) to make thiazolidine, thiazoline 
and thiazole, bi-thiazolidine, bithiazoline, and bithiazole-containing microbioal metabolites. 
Other uses include, but are not limited to the usage of the blm gene 
1 5 cluster/modules/catalytic units (either individually or collectively) or the Blm model to make 
heterocyclic ring-containing microbioal metabolites, such as five member S- and N- 
containing compounds of the thiazolidine, thiazoline and thiazole family or the O- and N- 
containing compounds of the oxazolidine, oxazoline, and oxazole family or to make sugars, 
such L-sugars (with the BlmG epimerase), sugars modified by carbamoyl group (with 
20 BlmD), and disaccharides. 

This invention also includes the discovery of a novel discrete PCP protein 
(encoded by the Blml gene). Apo-Blml can be efficiently modified into holo-Blml either in 
vivo or in vitro by PCP-specific 4'-phosphopantetheine transferases (PPTases) such as Gsp 
and Sfp. Unlike the PCP domains in type I NRPSs, blml lacks its cognate A domain and can 
25 be aminoacylated by Val-A, an A domain from a completely unrelated type I NRPS. Blml, 
therefore, represents the first characterized bype II PCP, providing the genetic and 
biochemical evidence to support the existence of a bype II NRPS. The latter system is 
useful, in a manner analogous to the type I NRPS, i.e., modular NRPS, in the combinatorial 
manipulation of NRPS proteins to generate novel peptides. This invention also includes the 
30 discovery and characterizaton of a novel PPTase (encoded by the pptA gene in Figure 1 3). 
This PPTase can be used in engineered biosynthesis of polyketides, peptides, hybrid peptide 
and polyketide metabolites, hybrid polyketide and peptide metabolites, or the combination of 
both types of metabolites. The PPTase can also be used in converting apo-peptidyl carrier 
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proteins (both type I and type II) and acyl carrier proteins (both type I and type II) into the 
holo-proteins. 

The Examples provided herein and the accompanying primers permit one of 
ordinary skill in the art to isolate the blm gene cluster of this invention, its constituent ORFs, 
5 various modules, or enzymatic domains. The isolated nucleic acid components can be used 
to express one or more polypeptide components for in vivo {e.g. recombinant) synthesis of 
one or more polypeptides and/or polyketides as indicated above. It will also be appreciated 
that the blm cluster polypeptides can be used for ex vivo assembly of various 
macromolecules. 

10 I. BLM gene cluster and the PPTase gene. 

A) The BLM gene cluster. 

The nucleic acids comprising the blm gene cluster are identified in Tables I 
and II and listed in the sequence listing provided herein (SEQ ID NOS: 1 and 2, GenBank 
Accession numbers AT- 149091, AT-210249, AF2103 1 1). In particular, Table I identifies 
15 genes and functions of open reading frames (ORFs) responsible for the biosynthesis of the 
hybrid peptide/polyketide/peptide backbone and sugar moieties of bleomycin, while Table II 
identifies a number of ORFs comprising the blm gene cluster, identifies the activity of the 
catalytic domain encoded by the ORF and provides primers for the amplification and 
isolation of that orf. 

20 As illustrated in Example 1, the blm cluster comprises a PKS module, flanked 

by several NRPS modules along with several sugar biosynthesis genes and genes encoding 
other biosynthesis enzymes as well as several resistance and regulatory genes (Table 1). 



Table I. Determined functions of ORFs in the bleomycin biosynthesis gene cluster 



Gene 


Amino 
acids 


Sequence Homolog 1 


Proposed function 2, 3 


orfS 


424 


YqeR (BAA12461) 


Oxidase 


blmC 


498 


RfaE(AA07904.1) 


NDP-glucose synthase 


blml 


90 


GrsB (P14688) 


Type II PCP 


blmD 


545 


NodU(Q53515 


Carbamoyl transferase 


UmE 


390 


RfaF (AAD 16056) 


Glycosyl transferase 


orfl3 


187 


MbtH (005821) 


Unknown 


blmll 


462 


Nip (CAA98937) 


NRPS condensation enzyme 


orflS 


339 


SyrP (1890776) 


Regulation 


blmll 


935 


HMWP2 (P48633), McbC 
(P23185) 


A PCP Ox 
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hl ™^ & 


2626 


HMWP2 (P48633) 


J\ *^y /a. rLr v^y 




638 


AsnB (2293165) 




blmF 


494 


RfbC (Q50864)/BlmOrfl 
(507319) 


Glycosyl transferase/|3-hydroxylase 


blmG 


325 


YtcB (2293288) 


Sugar epimerase 


blmV 


645 


McyB (2708278) 


PCP C 


blmVI 


2675 


ACoAS (1658531), PksD 

/ ium) 
SnbDE (CAA67249) 


A 4 ACP C A PCP C A 


blmVII 


1218 


SyrE (3510629) 


C A PCP 


blmVIII 


1841 


HMWP1 (CAA73127) 


KS AT MI KR ACP 


blmlX 


1066 


SafB (1171128) 


C A PCP 


blmX 


2140 


TycC (2623773) 


C A PCP C A PCP 


blmXI 


688 


SyrE (3510629) 


NRPS condensation enzyme 


orf28 


239 


SC9C7.04C (CAA22716) 


Unknown 


or/29 


582 


YvdB (CAB08068) 


Transmembrane transporter 


or/30 


113 


SmtB (P30340) 


Regulation 


or/31 


117 


PhnA(P16680) 


Unknown 



1 . Protein accession numbers are given in parentheses. 2. Underlined domains contain motifs that are clearly 
different from known NRPS or PKS domains. 3. This A domain lacks the typical NRPS Al, A2, A4, A8, and 
A9 motifs and more closely resembles acyl CoA synthases. ORF1 to ORF7 were reported by Schmidt (1994) 
Gene 151:17-21, who assigned ORF2 as blmA and ORF4 as blmB. 



5 

Noteworthy are the genes encoding the NRPS and PKS enzymes. The blml, 
blmll, and blmXI genes encode NRPSs with an unusual architecture. In contrast to all known 
NRPSs, which are of modular organization with each module consisting minimally of a 
condensation (C), an adenylation (A), and a peptidyl carrier protein (PCP) domain, Blml, 

10 Blmll, and BlmXI are discrete proteins homologous to individual domains of type I NRPSs. 
We have characterized Blml as a type II PCP (Du and Shen (1999) Chem. Biol. 6: 507-517). 
The Blmll and BlmXI proteins can serve as candidates for type II condensation enzymes. 

The blmlll, blmlV, blmV, blmVI, blmVII, blmlX, and blmX genes encode 
modular NRPSs consisting of domains characteristic for known type I NRPSs, such as the A, 

15 PCP, C, and condensation/cyclization (Cy) domains, as well as an unprecedented oxidation 
(Ox) domain. BlmVI is unique among all the Blm NRPSs identified. Its N-terminal module 
(NRPS-5) consists of an atypical A domain, which bears a close resemblance to a family of 
acyl CoA synthases (Fitzmaurice and Kolattukudy (1997) J. Bacteriol. 179: 2608-2615; 
Fitzmaurice and Kolattukudy (1998) J. Biol. Chem. 273: 8033-8039), and an acyl carrier 

20 protein (ACP)-like domain. Its C-terminal module is truncated and presumably interacts 

with BlmV to constitute the complete NRPS-3 module (Fig. IB). Also noteworthy are the C 

domain of NRPS-3 that lacks both His residues of the conserved HHxxxDG (SEQ ID NO: 4) 

active site for transpeptidation (Stachelhaus et al. (1998) J. Biol. Chem. 273: 22773-22781) 

20 




and the extra C domain at the C-terminus of BlmV. These unusual features associated with 
BlmVI and BlmV may play roles in the formation of the (3-aminoalaninamide and the 
pyrimidine moieties of BLM, which are unprecedented in peptide biosynthesis. 

The blmVIII gene encodes a PKS module consisting of domains characteristic 
5 for known PKSs, such as ketoacyl synthase (KS), acyltransferase (AT), ketoreductase (KR), 
and ACP, with malonyl CoA acting as an extending unit according to sequence comparison 
of the AT domain (Haydock et al. (1995) FEBSLett. 374: 246-248) (Fig. IB). 

The identification of an integrated methyltransferase (MT) domain in the 
middle of BlmVIII is unique, representing the first PKS from actinomycetes that contains an 
10 internal MT domain. 



Table II. Blm gene cluster open reading frames (ORFs) and primers for ORF amplification. 



Orf# 


Position 


Activity 


Method 


Primers 

Forward 
Reverse 


Se 

q 

ID 
No 


orf-8 


76183- 
77457 


Oxygen-independent 
coprop orphyrinogen 
III oxidase 


Gapped-blast 
comparison 1 


F: ATGAGCCACGCCATCGGA 
R : TCAGGCGCGTTCGGGGGC 


5 


orf-9 


74690- 
76186 


ADP-heptose synthase 
(blmQ 


Gapped-blast 
comparison 1 


F: GTGAACACCGACCTGCCC 
R : TCATGGGGTGTCTCCCTC 


7 
8 


orf- 
10 


74421- 
74693 


Peptidyl carrier 

protein 

(blml) 


Expression and 

biochemical 

characterization. 2 


F: ATGAGCGCCCCGCGGGGC 
R: TCACCGGTCCCGCTCCCC 


9 

10 


orf- 
11 


72787- 
74424 


Carbamyltransferase 
(blmD) 


Gapped-blast 
comparison 1 


F: ATGAGCGCCGACCCGTCC 
R: TCATGAGCGGGCCGCCGT 


11 
12 


orf- 
12 


71618- 
72790 


ADP-heptose:LPS 
heptosyl transferase 
(blmE) 


Gapped-blast 
comparison 1 


F: ATGACCACCCCCATGACC 
R: TCATGGGGTACTCCTGAT 


13 
14 


orf- 
13 


70983- 
71546 


Homolog of mbtH in 
the synthesis of 
mycobactin 


Gapped-blast 
comparison 1 


F: ATGAC CACGACCC CGCGG 
R: TCAGGTGCCGGACACGCG 


15 
16 


orf- 
14 


69598- 
70986 


Peptide synthetase 
(condensation, blmll) 


Gapped-blast 
comparison 1 


F: GTGACCGCCCCCGGCACA 
R: TCATCGGTGGCTCCTCGT 


17 
18 


orf- 
15 


68582- 
69601 


Regulatory gene 
(homolog of syrP) 


Gapped-blast 
comparison 1 


F: GTGAACCGGCACGGCCCC 
R: TCACGCGCTCACCTCGTC 


19 
20 


orf- 
16 


65778- 
68585 


Mutated peptide 
synthetase- oxidase 
(NRPS-0, blmllT) 


Gapped-blast 
comparison 1 


F: GTGACGAGCGCCCGGCCC 
R : TCACGGGGCCTCCGTGCG 


21 
22 


orf- 
17 


57901- 
65781 


Peptide synthetase 
(NRPS-2-l,Wm/F) 


Expression and 

biochemical 

characterization. 2 


F: ATGCTGCACGGCGCCGCG 
R: TCACTCCGGTCCACCTCC 


23 
24 



21 



orf- 
18 


55899- 
57815 


Asparagine synthetase 


Gapped-blast 
comparison 1 


F: GTGAGGCCCGTGTGCGGC 
R: TCAGCCACCGTTGCCGCC 


25 
26 


orf- 
19 


54418- 
55902 


Homolog of 
hydroxylase- 
dehydrogenase (blmF) 


Gapped-blast 
comparison 1 


F: GTGAAGGACCTCGGCCGG 
R: TCACTCCCCCGGTGCCGG 


27 
28 


orf- 
20 


53427- 
54404 


Nucleotide-sugar 

epimerase 

(blmG) 


Gapped-blast 
comparison 1 


F : GTGACATGGACCGTGGTG 
R: TCAGGCATCGGCCCTCCC 


29 
30 


orf- 
21 


51493- 
53430 


Peptide synthetase 
(NRPS-3CT, UmV) 


comparison 1 


F: ATGCGCGGGCATGACGAC 
R: TCACGGTGTCTCTCCCTC 


31 
32 


orf- 
22 


43263- 
51290 


Peptide synthetase 
(NRPS-5-4-3, blmVI) 


Expression and 

biochemical 

characterization. 2 


F: ATGAGCCGGCCGGCCGGC 
R : TCATGCTCGGTCATCGCC 


33 


orf- 
23 


39610- 
43266 


Peptide synthetase 
(NRPS-6, blmVII) 


Expression and 

biochemical 

characterization. 2 


F: GTGACCACGCCCCGCATC 
R: TCATTCGGGACGCGGGCA 


35 


orf- 
24 


34088- 
39613 


Polyketide synthase 
(blmVIII) 


Gapped-blast 
comparison 1 


F : ATGAGCCATGCCGACGCG 
R: TCACAGCACCACCTCTTC 


37 
38 


orf- 
25 


30891- 
34091 


Peptide synthetase 
(NRPS-7, blmlX) 


Gapped-blast 
comparison 1 


F: ATGACCCCGGCCGCCGAC 
R: TCATCGTCCGCCGCCTTT 


39 
40 


orf- 
26 


24406- 
30894 


Peptide synthetase 
(NRPS-9-8, blmX) 


Gapped-blast 
comparison 1 


F: ATGCCTCGGTGTGCCCGA 
R : TCATTCGGCGGCACCTCC 


41 
42 


orf- 
27 


22127- 
24193 


Peptide synthetase 
(condensation, blmXT) 


Gapped-blast 
comparison 1 


F: GTGGGTTTCCGTCGAGCG 
R: TTACACCCTCCGTTTCTC 


43 
44 


orf- 
28 


21367- 
22086 


Pho sphatidy 1 serine 
decarboxylase 


Gapped-blast 
comparison 1 


F: ATGGCACAGGACCTGAAC 
R: TCAACGCCACCGGATCTT 


45 
46 


orf- 
29 


19161- 
20909 


Transmembrane 
transporter 


Gapped-blast 
comparison 1 


F: GTGAGCTCCCTCGCCGTC 
R: TCATCGTCGGGCACTCGG 


47 
48 


orf- 
30 


18823- 
19164 


Metal dependent 
regulatory element 


Gapped-blast 
comparison 1 


F: GTGCCGGTTCCGCTGTAT 
R: TCACCGGGCACTGACCTC 


49 
50 


orf- 
31 


18660- 
18307 


PHNA homolog 


Gapped-blast 
comparison 1 


F: GTGACCGAGAACCTTCCG 
R: TCAGACCTTCTTGACCAC 


51 
52 


orf- 

32 


17736- 
9211 


Peptide synthetase 
(NRPS-11-10) 


Gapped-blast 
comparison 1 


F: ATGGCCTCAGACGCTTTG 
R: TCATTGAGACTCCTCCTC 


53 
54 


orf- 
33 


9214- 
7859 


Putative transporter 


Gapped-blast 
comparison 1 


F: ATGATGAAGTCAAGCCGC 
R: TCAGTGGCTTACAAGGAG 


55 
56 


orf- 
34 


7797- 
6784 


Homolog of 
clavaminic acid 
synthase 


Gapped-blast 
comparison 1 


F: ATGACTGACCTGCCGTTG 
R: TCACACCAGCAGCGAGGT 


57 
58 


orf- 
35 


6773- 
6021 


Thioesterase 


Gapped-blast 
comparison 1 


F: ATGGATTTCCCCCTCACC 
R: TCATGCCCCTACCTCGGC 


59 
60 


orf- 
36 


6024- 
4741 


Putative transporter 


Gapped-blast 
comparison 1 


F: ATGACCGCGCGCGTCGAC 
R: TCACTCCTCGGCTTCGGC 


61 
62 


orf- 
37 


4733- 
3915 


Unknown 


Gapped-blast 
comparison 1 


F: GTGTCCAAGAACGCGGCG 
R: TCATCGGCTCGCCTCGTG 


63 
64 


orf- 

38 


3918- 
2182 


Peptide synthetase 
(NRPS-12) 


Gapped-blast 
comparison 1 


F: ATGACCCTCACCCTGCGG 
R: TCACTCGGGCACTCCTTC 


65 
66 


orf- 
39 


2185- 
1199 


Regulatory gene 
(homolog of SyrP 


Gapped-blast 
comparison 1 


F: GTGACCGGTTCCGTAACG 
R: TCATGAGTCCGCCGAGGT 


67 
68 


orf- 


1015-1 


Peptide synthetase 


Gapped-blast 


F: ATGACAGAGGT CCGAGGT 


69 



22 



40 






comparison 1 


R: 


CCCGGCAACCGCCCTCCC 


70 


orf- 


On a 


4'- 


Expression and 


F: 


GTGATCGCCGCCCTCCTG 


71 


41 


separate 


phosphopantetheinyl 


biochemical 


R: 


TTACGGGACGGCGGTCCG 


72 




sequence 


transferase (pptA) 


characterization. 2 









The Blm megasynthetase comprises nine NRPS modules and one PKS 
module forming a hybrid NRPS/PKS/NRPS metasynthetase (Fig. 1 A). Inspection of the blm 
gene cluster (Fig. 2) showed that the Blm NRPS and PKS modules apparently are not 
5 organized according to the "colinearity rule" for BLM biosynthesis (Fig. 1). Detailed 

functional organization of the megasynthetase and the BLM synthetic pathway is provided in 
Example I. 

B) PPTase 

This invention also provides the gene (pptA, Fig. 13) encoding 

10 phosphopantetheine transferase (PPTase) (GenBank Accession No: AF2103 1 1) (see, SEQ ID 
NO: 3). PPTase converts carrier proteins for the growing acyl chain from inactive apo-forms 
to functional holo-forms by the covalent attachment of the 4 '-phosphopantetheine moiety of 
coenzyme A to a conserved serine residue of the carrier-protein substrate (see, e.g., Fig. 1 A). 

Using the sequence information provided herein (e.g. primer sequences and 

1 5 PPTase sequence) the PPTase nucleic acids can be routinely isolated according to standard 
methods (e.g. PCR amplification). Detailed protocols for the isolation of the PPTase are 
provided in Example 3. 

Other PPTases can be identified using the probes and primers illustrated in 
Example 3. Briefly, using a primer to the THC motif (5'-C GGC ATG GTC GGC TCC HTN 

20 CAN CAY TG -3', SEQ ID NO: 73, where H= C+A, N = A + C + T + G, Y = C + T,K = G 
+ T, R = A + G, W = T + A), and a primer designed around the typical C-terminal PPTase 
motif (e.g., KEA-1: 5'-T GCA GCA GAA CAG GAG GCKNYC CCA NKG - 3', SEQ ID 
NO: 74, and KEA-2: 5'- TG GGT CAG CGG GTA CCA NRC YTT RWA - 3', SEQ ID NO: 
75), and using 5*. verticillus chromosomal DNA as template, the set of primers THC/KEA-2 

25 a probe can be amplified (about 250 bp), that specifically binds to a PPTase. Libraries of 
organisms comprising NRPS, PKS, and/or hybrid PKS/NRPS pathways can be probed for 
the presence of a PPTase sequence. Once hybridizing clones are identified, the PPTase 
sequence can be isolated according to standard methods well know to those of skill in the art 
(see, e.g., Example 3). 
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C) Isolation/preparation of nucleic acids. 

In one embodiment, this invention provides nucleic acids for the recombinant 
expression of a bleomycin. Such nucleic acids include isolated gene cluster(s) comprising 
open reading frames encoding polypeptides sufficient to direct the assembly of a bleomycin. 
5 In other embodiments of this invention, modified bleomycins (e.g. bleomycin 

analogs), novel polyketides, polypeptides, and combinations thereof (polyketide/polypeptide 
hybrids) are created by modifying PKSs and/or NRPSs so as to introduce variations into 
known polymers synthesized by the enzymes. Such variations may be introduced by design, 
for example to modify a known molecule in a specific way, e.g. by replacing a single 

10 monomeric unit within a polymer with another, thereby creating a derivative molecule of 
predicted structure. Alternatively, variations can be made randomly, for example by making 
a library of molecular variants of a known polymer by systematically or haphazardly 
replacing one or more modules or enzymatic domains in a known PKS or NRPS with a 
collection of alternative modules or domains. Production of alternative/modified PKSs, 

15 NRPSs and hybrid systems is described below. 

Using the primer and sequence information provided herein, one of ordinary 
skill in the art can routinely isolate/clone the PKS and/or NRPS modules and/or enzymatic 
domains described herein. For example, the PCR primers provided in Table II, above, can 
be used to amplify any of the orfs identified therein. Moreover, using the sequence 

20 information for the blm gene cluster provided herein, the design of other primers suitable of 
the amplification of individual ORFs, combinations of ORFs, genes, etc. is routine. 

Typically such amplifications will utilize the DNA of an organism containing 
the requisite genes (e.g. Streptomyces verticillus) as a template. Typical amplification 
conditions include a PCR mixture consisting of 5 ng of S verticillus genomic or plasmid 

25 DNA as template, 25 pmoles of ech primers, 25 \iM dNTP, 5% DMSO, 2 units of Tag 
polymerase, 1 x buffer, with or without 20% glycerol in a final volume of 50 juL. PCR is 
carried out (e.g. on a Gene Amp PCR System 2400 (Perkin-Elmer/ABI)) with a cycling 
scheme as follows: initial denaturing at 94°C for 5 min, 24-36 cycles of 45 sec at 94°C, 1 
min at 60°C, 2 min at 72°C, followed by additional 7 min at 72°C. One of skill will 

30 appreciate that optimization of such a protocol, e.g. to improve yield, etc. is routine (see, e.g., 
U.S. Patent No. 4,683,202; Innis (1990) PCR Protocols A Guide to Methods and 
Applications Academic Press Inc. San Diego, CA, etc). In addition, primer may be designed 
to introduce restriction sites and so facilitate cloning of the amplified sequence into a vector. 

24 



Using the information provided herein other approaches to cloning the desired 
sequences will be apparent to those of skill in the art. For example, the PKS or NRPS 
modules or enzymatic domains of interest can be obtained from an organism that expresses 
the same, using recombinant methods, such as by screening cDNA or genomic libraries, 
5 derived from cells expressing the gene, or by deriving the gene from a vector known to 
include the same. The gene can then be isolated and combined with other desired NRPS 
and/or PKS modules or domains, using standard techniques. If the gene in question is 
already present in a suitable expression vector, it can be combined in situ, with, e.g., other 
PKS subunits, as desired. The gene of interest can also be produced synthetically, rather 

10 than cloned. The nucleotide sequence can be designed with the appropriate codons for the 
particular amino acid sequence desired. In general, one will select preferred codons for the 
intended host in which the sequence will be expressed. The complete sequence can be 
assembled from overlapping oligonucleotides prepared by standard methods and assembled 
into a complete coding sequence {see, e.g., Edge (1981) Nature 292:756; Nambair et al. 

15 (1984) Science 223: 1299; Jay et al. (1984) J. Biol. Chem. 259:63 1 1). In addition, it is noted 
that custom gene synthesis is commercially available {see, e.g. Operon Technologies, 
Alameda, CA). 

Examples of such techniques and instructions sufficient to direct persons of 
skill through many cloning exercises are found in Berger and Kimmel (1989) Guide to 

20 Molecular Cloning Techniques, Methods in Enzymology 152 Academic Press, Inc., San 

Diego, CA (Berger); Sambrook et al. (1989) Molecular Cloning - A Laboratory Manual (2nd 
ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY; Ausubel (19 
1994) Current Protocols in Molecular Biology, Current Protocols, a joint venture between 
Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., U.S. Patent 5,017,478; and 

25 European Patent No. 0,246,864. 

II. Expression of blm gene clusters, modules, and enzymatic domains. 

As indicated above, in one embodiment this invention provides novel NRPS 
and PKS genes for the efficient recombinant production of both novel and known 
polyketides, peptides, and polyketide/polypeptide hybrids by expressing them in vivo. In 
30 other embodiments, such syntheses are carried out in vitro. Even in vitro syntheses, 

however, typically utilize recombinantly expressed PKSs, NRPSs, or enzymatic domains 
thereof. Thus, it is frequently desirable to express protein components of the PKSs or NRPs 
described above. 




Typically expression of the protein components of the pathway and/or of the 
products of the NRPS/PKS pathway is accomplished by placing the subject PKS or NRPS 
nucleic acid(s) in an expression vector, and transfecting a cell with the vector such that the 
cell expresses the desired product(s). 

5 A) Expression vectors 

The choice of vector depends on the sequence(s) that are to be expressed. 
Any transducible cloning vector can be used as a cloning vector for the nucleic acid 
constructs of this invention. However, where large clusters are to be expressed, it 
phagemids, cosmids, Pis, YACs, BACs, PACs, HACs or similar cloning vectors be used for 

10 cloning the nucleotide sequences into the host cell. Phagemids, cosmids, and BACs, for 
example, are advantageous vectors due to the ability to insert and stably propagate therein 
larger fragments of DNA than in Ml 3 phage and lambda phage, respectively. Phagemids 
which will find use in this method generally include hybrids between plasmids and 
filamentous phage cloning vehicles. Cosmids which will find use in this method generally 

15 include lambda phage-based vectors into which cos sites have been inserted. Recipient pool 
cloning vectors can be any suitable plasmid. The cloning vectors into which pools of 
mutants are inserted may be identical or may be constructed to harbor and express different 
genetic markers (see, e.g., Sambrook et al., supra). The utility of employing such vectors 
having different marker genes may be exploited to facilitate a determination of successful 

20 transduction. 

In preferred embodiments of this invention, vectors are used to introduce 
PKS, NRPS, or NRPS/PKS genes or gene clusters into host (e.g. Streptomyces) cells. 
Numerous vectors for use in particular host cells are well known to those of skill in the art. 
For example described in Malpartida and Hopwook, (1984) Nature, 309:462-464; Kao et al., 
25 (1994), Science, 265: 509-512; and Hopwood et al., (1987) Methods Enzymol, 153:116-166 
all describe vectors for use in various Streptomyces hosts. 

In a preferred embodiment, Streptomyces vectors are used that include 
sequences that allow their introduction and maintenance in E. coli. Such Streptomyces! E. 
coli shuttle vectors have been described (see, for example, Vara et al, (1989) J. Bacteriol., 
30 171:5872-5881; Guilfoile & Hutchinson (1991) Proc. Natl. Acad. Sci. USA, 88: 8553-8557.) 

The gene sequences, or fragments thereof, which collectively encode a PKS 
and/or NRPS and/or PKS/NRPS gene cluster, one or more ORFs, one or more modules, or 
one or more enzymatic domains of this invention, can be inserted into one or more 




expression vectors, using methods known to those of skill in the art. Expression vectors will 
include control sequences operably linked to the desired NRPS and/or PKS coding sequence 
or fragment thereof. Suitable expression systems for use with the present invention include 
systems that function in eucaryotic and prokaryotic host cells. However, as explained above, 
5 prokaryotic systems are preferred, and in particular, systems compatible with Streptomyces 
spp. are of particular interest. Control elements for use in such systems include promoters, 
optionally containing operator sequences, and ribosome binding sites. Particularly useful 
promoters include control sequences derived from PKS and/or NRPS gene clusters, such as 
one or more act promoters. However, other bacterial promoters, such as those derived from 

10 sugar metabolizing enzymes, such as galactose, lactose {lac) and maltose, will also find use 
in the present constructs. Additional examples include promoter sequences derived from 
biosynthetic enzymes such as tryptophan (trp), the beta -lactamase {bid) promoter system, 
bacteriophage lambda PL, and T5. In addition, synthetic promoters, such as the tac promoter 
(U.S. Patent 4,551,433), which do not occur in nature also function in bacterial host cells. In 

15 Streptomyces, numerous promoters have been described including constitutive promoters, 
such as ermE and tcmG (Shen and Hutchinson, (1994) J. Biol. Chem. 269: 30726-30733), as 
well as controllable promoters such as actlzxA actlll (Pleper et al, (1995) Nature, 378: 263- 
266; Pieper et al, (1995) J. Am. Chem. Soc, 117: 1 1373-11374; and Wiesmann et al., (1995) 
Chem. & Biol. 2: 583-589). 

20 Other regulatory sequences may also be desirable which allow for regulation 

of expression of the PKS replacement sequences relative to the growth of the host cell. 
Regulatory sequences are known to those of skill in the art, and examples include those 
which cause the expression of a gene to be turned on or off in response to a chemical or 
physical stimulus, including the presence of a regulatory compound. Other types of 

25 regulatory elements may also be present in the vector, for example, enhancer sequences. 

Selectable markers can also be included in the recombinant expression 
vectors. A variety of markers are known which are useful in selecting for transformed cell 
lines and generally comprise a gene whose expression confers a selectable phenotype on 
transformed cells when the cells are grown in an appropriate selective medium. Such 

30 markers include, for example, genes that confer antibiotic resistance or sensitivity to the 
plasmid. Alternatively, several polyketides are naturally colored and this characteristic 
provides a built-in marker for selecting cells successfully transformed by the present 
constructs. 
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The various PKS and/or NRPS clusters or subunits of interest can be cloned 
into one or more recombinant vectors as individual cassettes, with separate control elements, 
or under the control of, e.g., a single promoter. The PKS and/or NRPS subunits can include 
flanking restriction sites to allow for the easy deletion and insertion of other PKS subunits so 
5 that hybrid PKSs can be generated. The design of such unique restriction sites is known to 
those of skill in the art and can be accomplished using the techniques described above, such 
as site-directed mutagenesis and PCR. 

Methods of cloning and expressing large nucleic acids such as gene clusters, 
including PKS- or NRPS-encoding gene clusters, in cells including Streptomyces are well 

10 known to those of skill in the art (see, e.g., Stutzman-Engwall and Hutchinson (1989) Proc. 
Natl. Acad. Set USA, 86: 3135-3139; Motamedi and Hutchinson (1987) Proc. Natl. Acad. 
Sci. USA, 84: 4445-4449; Grim et al. (1994) Gene, 151: 1-10; Kao et al. (1994) Science, 
265: 509-512; and Hopwood et al. (1987) Meth. Enzymol, 153: 1 16-166). In some 
examples, nucleic acid sequences of well over lOOkb have been introduced into cells, 

15 including prokaryotic cells, using vector-based methods (see, for example, Osoegawa et al, 
(1998) Genomics, 52: 1-8; Woon et al, (1998) Genomics, 50: 306-316; Huang et al, (1996) 
Nucl. Acids Res., 24: 4202-4209). In addition, the cloning and overexpression of NRPS-1 
and NRPS-6 is illustrated in Example 1. 

In certain embodiments this invention may make use of genetically 

20 engineered cells that either lack PKS and/or NRPS genes or have their naturally occurring 
PKS and/or NRPS genes substantially deleted. These host cells can be transformed with 
recombinant vectors, encoding a variety of PKS and/or NRPS gene clusters, for the 
production of active polyketides. The invention provides for the production of significant 
quantities of product, e.g. a bleomycin, at an appropriate stage of the growth cycle. The 

25 BLMs or other hybrid polyketide/peptide metabolites so produced can be used as therapeutic 
agents, to treat a number of disorders, depending on the type of metabolites in question. For 
example, several of the polyketides and peptides produced by the present method will find 
use as immunosuppressants, as anti-tumor agents, as well as for the treatment of viral, 
bacterial and parasitic infections. The ability to recombinantly produce polyketides and 

30 peptides also provides a powerful tool for characterizing PKSs and/or NRPSs and the 
mechanism of their actions. 
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B) Host cells. 

The vectors described above can be used to express various protein 
components of the polyketide and/or polypeptide synthetic modules for subsequent isolation 
and/or to provide a biological synthesis of one or more desired biomolecules (e.g 
5 polyketides, peptides, etc.). Where one or more proteins of the blm cluster are expressed 
(e.g. overexpressed) for subsequent isolation and/or characterization, the proteins are 
expressed in any prokaryotic or eukaryotic cell suitable for protein expression. In one 
preferred embodiment, the proteins are expressed in E. coli. Overexpression of blml in E. 
coli is described in Example 2. 

10 Host cells for the recombinant production of the subject polyketides can be 

derived from any organism with the capability of harboring a recombinant PKS, NRPS or 
PKS/NRPS gene cluster. Thus, the host cells of the present invention can be derived from 
either prokaryotic or eucaryotic organisms. However, preferred host cells are those 
constructed from the actinomycetes, a class of mycelial bacteria which are abundant 

15 producers of a number of polyketides and peptides. A particularly preferred genus for use 
with the present system is Streptomyces. Thus, for example, S. verticillus S. ambofaciens, S. 
avermitilis, S. azureus, S. cinnamonensis, S, coelicolor, S. curacoi, S. erythraeus, S.fradiae, 
S. galilaeus, S. glaucescens, S. hygroscopicus, S. lividans, S. parvulus, S. peucetius, S. 
rimosus, S. roseofulvus, S. thermotolerans, S. violaceoruber, among others, will provide 

20 convenient host cells for the subject invention, with S. coelicolor being preferred {see, e.g., 
Hopwood, D. A. and Sherman, D. H. Ann. Rev. Genet. (1990) 24:37-66; O'Hagan, D. The 
Polyketide Metabolites (Ellis Horwood Limited, 1991), for a description of various 
polyketide-producing organisms and their natural products.) 

In a preferred embodiment, the above-described cells are genetically 

25 engineered by deleting one or more naturally occurring PKS and/or NRPS genes therefrom, 
using standard techniques, such as by homologous recombination, (see, e.g., Khosla, et al. 
(1992) Molec. Microbiol. 6: 3237). 

In certain embodiments, a eukaryotic host cell is preferred {e.g. where certain 
glycosylation patterns are desired). Suitable eukaryotic host cells are well known to those of 

30 skill in the art. Such eukaryotic cells include, but are not limited to yeast cells, insect cells, 
plant cells, fungal cells, and various mammalian cells (e.g. COS, CHO HeLa cells lines and 
various myeloma cell lines) 
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C) Protein/polyketide recovery. 

Polypeptide and/or polyketide recovery is accomplished according to standard 
methods well known to those of skill in the art. Thus, for example where blm cluster 
proteins are to be expressed and isolated, the proteins can be expressed with a convenient tag 
5 to facilitate isolation (e.g. a Hise) tag. ■ Other standard protein purification techniques are 
suitable and well known to those of skill in the art {see, e.g., Quadri et al. (1998) 
Biochemistry 37: 1585-1595; Nakano et al. (1992) Mol. Gen. Genet. 232: 313-321, etc.). 

Similarly where components {e.g. modules and/or enzymatic domains) of the 
blm cluster are used to express various biomolecules {e.g. polyketides, sugars, polypeptides, 
10 etc.) the desired product and/or shunt metabolite(s) are isolated according to standard 

methods well know to those of skill in the art {see, e.g., Carreras and Khosla (1998) supra.) 
Purification and in vitro reconstitution of the essential protein components of an aromatic 
polyketide synthase. Biochemistry 37: 2084-2088, Deutscher (1990) Methods in Enzymology 
Volume 182: Guide to Protein Purification, M. Deutscher, ed. . 

15 III. Synthesis of recombinant bleomycins. 

In one embodiment this invention provides methods of synthesizing 
bleomycins and recombinantly synthesized bleomycins. As indicated above, this is generally 
accomplished by providing an organism {e.g. a bacterial cell) containing sufficient 
compoents of the blm gene cluster to direct synthesis of a complete bleomycin. 

20 In one embodiment, the entire blm cluster is cloned into a Streptomyces strain 

{e.g., S. lividans or S. coelicolor). Kao et al.{\99A) Science, 265: 509-512, have cloned the 
30 kb DEBS genes from Sacc. erythmea into S. coelicolor and produced 6- 
deoxyerythronolide B in S. coelicolor and these methods can be used construct an expression 
plasmid for heterologous expression of the blm cluster. This method involves the transfer of 

25 DNA between a temperature-sensitive plasmid and a shuttle vector by means of a 

homologous double recombination event in E. coli {Id.). In a preferred embodiment, the two 
ends spanning the blm cluster are cloned into a temperature-sensitive plasmid that is 
chloramphenicol resistant (CM R ) such as pCK6. S. verticillus DNA is then rescued from a 
donor into the temperature-sensitive recipient by co-transforming E. coli with the Cm R 

30 recipient plasmid and the apramycin resistant (Ap R ) pKC505 donor cosmid that contains the 
blm gene cluster, followed by chloramphenicol and apramycin selection at 30°C. Colonies 
harboring both plasmids (Cm R , Ap R ) will be shifted to 44°C on chloramphenicol and 
apramycin plates and only those cointegrates formed by a single recombination event 




between the two plasmids are viable. Surviving colonies are then propagated at 30°C on 
Cm R plates to select for recombinant plasmids formed by the resolution of cointegrates 
through a second recombinant event. The desired blm cluster is cloned into the Cm R 
temperature-sensitive plasmid and is ready to be moved into any expression plasmid by a 
5 similar means of homologous recombinant event. 

For example, if pWHM861 is the choice of shuttle plasmid for the expression 
of the blm cluster in S. lividans (Meurer and Hutchinson (1995) J. Bacteriol, 111: 477-481), 
the two ends spanning the blm cluster downstream of the ErmE* promoter in the ampicillin 
resistant (AM R ) plasmid pWHM861 are cloned. The resulting plasmid is co-transformed 

1 0 with the temperature-sensitive plasmid containing the blm cluster described above into E. 
coli under the selection of chloramphenicol and ampicillin at 30°C. These Cm R and AM R 
colonies are shifted to 44°C on chloramphenicol and ampicillin plates to undergo a single 
recombination event and the surviving colonies are resolved on ampicillin plates at 30°C by 
completing the double recombination process. The resulting plasmid is suitable for 

1 5 transformation into S. lividans by selection of thiostrepton, in which the expression of the 
desired blm cluster is under the control of the ErmE* promoter. The S. lividans 
transformants are cultured and any metabolites produced are isolated and characterized. 

Once production of BLM in S. lividans is established, mutated alleles of the 
blm synthetase can be introduced into the blm cluster for the production of BLM analogs. 

20 IV. Altered endogenous expression of bleomycins. 

Using the Blm gene cluster information provided herein, one of skill in the art 
may regulating the synthesis of endogenous bleomycin. The expression of various ORFs 
comprising the blm gene cluster may be increased or decreased to alter bleomycin synthesis 
levels. 

25 Methods of altering the expression of endogenous genes are well known to 

those of skill in the art. Typically such methods involve altering or replacing all or a portion 
of the regulatory sequences controlling expression of the particular gene that is to be 
regulated. In a preferred embodiment, the regulatory sequences (e.g., the native promoter) 
upstream of one or more of the blm ORFs are altered. 

30 This is typically accomplished by the use of homologous recombination to 

introduce a heterologous nucleic acid into the native regulatory sequences. To downregulate 
expression of one or more blm ORFs, simple mutations that either alter the reading frame or 
disrupt the promoter are suitable. To upregulate expression of the blm ORF(s) the native 




promoter(s) can be substituted with heterologous promoter(s) that induce higher than normal 
levels of transcription. 

In a particularly preferred embodiment, nucleic acid sequences comprising the 
structural gene in question or upstream sequences are utilized for targeting heterologous 
5 recombination constructs. 

The use of homologous recombination to alter expression of endogenous 
genes is described in detail in U.S. Patent 5,272,071, WO 91/09955, WO 93/09222, WO 
96/29411, WO 95/31560, and WO 91/12650. 

V. Synthesis of BLM analogs. 

1 0 In one one embodiment, this invention provides methods of synthesizing 

modified bleomycins or bleomycin analogs. In preferred embodiments, the BLM analogs are 
synthesized either by introducing specific perturbations into individual NRPS and/or PKS 
enzymatic domains or modules, or by reprogramming the linear order in which the NRPS or 
PKS enzymatic domains and/or modules appear in the blm synthetase genes. The former 

1 5 will lead to BLM analogs with targeted modifications at the BLM backbone and the latter 

will allow incorporation of other extension units in variable sequence into the biosynthesis of 
BLM. In particularly preferred embodiments, the genetically modified blm synthetases are 
produced in S. verticilus, however, it will be recognized that the entire blm gene cluster can 
be cloned into other hosts, e.g. into S. lividans or S. coelicolor. 

20 In preferred embodiments modification of the blm gene cluster to yield BLM 

analogues is accomplished by one of two different approaches. In one approach, the BLM 
enzymatic domains and/or modules modules are altered in a directed manner (i.e. they are 
changed in a preselected way), while in another approach, random/haphazard alterations are 
introduced into the blm cluster and the resulting products are screened to identify those with 

25 desired properties. 

A) Synthesis of BLM analogs by specific engineering of the blm synthetase 
genes. 

The blm synthetase genes can be re-engineered by means of specific 
mutations or by reprogramming the linear order of the NRPS or PKS enzymatic domains or 
30 modules. In this approach, a wild-type blm synthetase allele is replaced with these mutants 
in and expressed in an appropriate host (e.g., S. verticillus or in a heterologous host). Since 
both NRPSs (Stachelhaus et al. (1995) Science, 269: 69-72) and PKSs (Donadio et al. (1993) 




Proc. Natl. Acad. Set USA, 90: 71 19-7123, Donadio et al. (1995) J. Am., Chem. Soc, 117: 
9105-9106, Cortes et al. (1995) Science, 268: 1487-1489) have shown considerable tolerance 
to reprogramming, it is expected that these modifications of the BLM synthetase will result 
in the production of BLM analogs with predicted structural alterations. For example, 
5 targeted modification at the (2S,3S,4R)-4-amino-3-hydroxy-2-methyl/pentanoic acid AHM 
moiety of BLM can be accomplished by introduction of mutations into the BLMVII1 PKS 
module of the BLM synthetase locus. Inactivation of the MT or KR motif by in-frame 
deletion or site-directed mutagenesis will result in the production of BLM analogs containing 
a demethyl-AHM, oxo-AHM, or oxo-demethyl-AHM moiety, etc. 

1 0 Alternatively, individual functional NRPS domains and/or the PKS module 

can be deleted or the PKS module can be duplicated in-frame to produce BLM analogs with 
shorter or longer backbone, respectively. Alternatively, or in addition, the NRPS domains or 
the PKS module can be rearranged for the production of BLM analogs with a completely 
different backbone. The NRPS and PKS features can be combined into one integrated 

1 5 system, providing access to a structural variation not available by either the NRPS or PKS 
system alone. 

To create such mutations, plasmids are constructed carrying in-frame 
deletions of DNA segments encompassing a portion of the Mm synthetase activities. 
Construction of specific deletions is preferably accomplished by one of the following two 

20 strategies. The first involves subcloning of a DNA fragment in a gene replacement vector, 
selection of two restriction sites suitably located at the two ends of the DNA segments, and 
deletion of this segment from within the plasmid by rejoining the two resulting ends. An in- 
frame deletion can be obtained by a suitable combination of Klenow filling and SI treatment 
of both ends prior to ligation. 

25 The second approach involves polymerase chain reaction (PCR) amplification 

of two DNA segments that separate the region to be deleted followed by joining of the two 
fragments in the correct orientation in a gene replacement vector. This can be accomplished 
by designing PCR primers with suitable restriction sites. The restriction site used to generate 
the deletion and the sequences to serve as templates for the PCR amplification are chosen so 

30 as to generate two segments of Mm synthetase DNA of approximately equal length in the 
construction in order to maximize the chance of gene replacement. The gene replacement 
vector containing the allelic or deletion mutation is introduced into a Streptomyces strain 
{e.g., S. verticillus). Integration of the plasmid into the S. verticillus chromosome via a 
single reciprocal homologous recombination will yield a recombinant that will be isolated by 




selection for the vector marker. The resulting integrants are then grown under non-selective 
conditions and further resolution by selection for the loss of the vector marker via the second 
homologous recombination event will produce the desired deletion mutants. 

Southern analysis of the isolated deletion mutants with the target DNA is 
5 performed to ensure that the expected double crossover recombination event has taken place. 
The first approach is convenient if there are suitably spaced restriction sites in the DNA 
sequence. The second approach enables the deletion of any DNA segment but may be 
limited by the size of the DNA segments that can be amplified by PCR. These S. verticillus 
recombinants are cultured under typical conditions for BLM production and the fermentation 
1 0 broth is screened for the production of any novel BLM analogs resulted from the specific 
mutations in the blm synthetase locus. 

B) Synthesis of BLM analogs by "random" modification of Mm synthetase 
genes. 

Bleomycin analogs can also be synthesized by randomly/haphazardly altering 
1 5 genes in the BLM cluster expressing the products of the randomly modified megasynthetase 
and then screening the products for the desired activity. Methods of "randomly" altering blm 
cluster genes are described below. 

VI. Generation of other synthetic systems. 

In addition to the production of bleomycin or modified bleomycins, the blm 
20 gene cluster or elements thereof can be used by themselves or in combination with NRPS 
and/or PKS modules and/or enzymatic domains of other PKS and/or NRPS systems to 
produce a wide variety of compounds including, but not limited to various polyketides, 
polypeptides, polyketide/polypeptide hybrids, various oxazoles and thiazoles, various sugars, 
various methylated polypeptides/polyketides, and the like. As with the production of 
25 modified bleomycins described above, such compounds can be produced, in vivo or in vitro, 
by catalytic biosynthesis using large, modular PKSs, NRPSs, and hybrid PKS/NRPS 
systems. The megasynthetases directing such syntheses can be rationally designed e.g. by 
predetermined alteration/modification of polyketide and/or polypeptide and/or hybrid 
PKS/NRPS pathways. Alternatively, large combinatorial libraries of cells harboring various 
30 megasynthetases can be produced by the random modification of particular pathways and 

then selected for the production of a molecule or molecules of interest. It will be appreciated 
that, in certain embodiments, such libraries of megasynthetases/modified pathways, can be 
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used to generate large, complex combinatorial libraries of compounds which themselves can 
be screened for a desired activity. 

A) Directed modification of biomolecules. 

Elements (e.g. open reading frames) of the blm biosynthetic gene cluster 
5 and/or variants thereof can be used in a wide variety of "directed" biosynthetic processes (i.e. 
where the process is designed to modify and/or synthesize one or more particular preselected 
metabolite(s)). Polypepitdes encoded by particular open reading frames or combinations of 
open reading frames can be utilized to perform particular chemical modifications of 
biological molecules. 

10 Thus, for example, open reading frames encoding a polypeptide synetase can 

be used to chemically modify an amino acid by coupling it to another amino acid. In another 
example, the methyl transferase in BlmVIII can be utilized to introduce methyl groups into 
polyketides, and other, substrates. The glycosyl transferases can be used to glycosylate 
appropriate substrates, and so forth. These examples, are merely illustrative. One of skill in 

1 5 the art, utilizing the information provided here, can perform literally countless chemical 
modifications and/or syntheses using either "native" bleomycin biosynthesis metabolites as 
the substrate molecule, or other molecules capable of acting as substrates for the particular 
enzymes in question. Other substrates can be identified by routine screening. Methods of 
screening enzymes for specific activity against particular substrates are well known to those 

20 of skill in the art. 

The biosyntheses can be performed in vivo, e.g. by providing a host cell 
comprising the desired blm gene cluster open reading frame(s) and/or in vivo, e.g., by 
providing the polypeptides encoded by the blm gene cluster ORFs and the appropriate 
substrates and/or cofactors. 

25 B) Directed engineering of novel synthetic pathways. 

In numerous embodiments of this invention, novel polyketides, polypeptides, 
and combinations thereof are created by modifying known PKSs or NRPSs so as to introduce 
variations into known polymers synthesized by the enzymes. Such variations may be 
introduced by design, for example to modify a known molecule in a specific way, e.g. by 
30 replacing a single monomeric unit within a polymer with another, thereby creating a 

derivative molecule of predicted structure. Such variations can also be made by adding one 
or more modules to a known PKS or NRPS, or by removing one or more module from a 
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known PKS or NRPS. Such novel PKSs or NRPSs can readily be made using a variety of 
techniques, including recombinant methods and in vitro synthetic methods. 

Using any of these methods, it is possible to introduce PKS domains into a 
NRPS, or vice versa, thereby creating novel molecules including both peptide and polyketide 
5 structural domains. For example, a PKS enzyme producing a known polyketide can be 
modified so as to include an additional module that adds a peptide moiety into the 
polyketide. Novel molecules synthesized using these methods can be screened, using 
standard methods, for any activity of interest, such as antibiotic activity, effects on the cell 
cycle, effects on the cytoskeleton, etc. 

10 Novel polyketides, polypeptides, or combinations thereof can also be made by 

creating novel PKSs or NRPSs de novo, using recombinant or in vitro synthetic methods. 
Such novel arrangements of domains can be designed, i.e. to create a specific polymer. In 
addition to creating novel PKSs or NRPSs by combining modules, the methods of this 
invention can also be used to make novel modules that can add new monomeric units to a 

15 growing polypeptide or polyketide chain. Because the identity of each module, and, 
consequently, the identity of the monomer added by the module, is determined by the 
identity and number of the functional domains comprising the module, it is possible to 
produce novel monomeric units by creating novel combinations of functional domains within 
a module. Such novel modules can be created by design, for example to make a specific 

20 module that will add a specific monomer to a polyketide or polypeptide, or can be created by 
the random association of domains so as to produce libraries of novel modules. Such novel 
modules can be made using recombinant or in vitro synthetic means. 

Mutations can be made to the native NRPS and/or PKS subunit sequences and 
such mutants used in place of the native sequence, so long as the mutants are able to function 

25 with other PKS and/or PKS subunits to collectively catalyze the synthesis of an identifiable 
polyketide and/or polypeptide. Such mutations can be made to the native sequences using 
conventional techniques such as by preparing synthetic oligonucleotides including the 
mutations and inserting the mutated sequence into the gene encoding a NRPS and/or PKS 
subunit using restriction endonuclease digestion, (see, e.g., Kunkel, (1985) Proc. Natl. Acad. 

30 Sci. USA 82: 448; Geisselsoder et al. (1987) BioTechniques 5: 786). Alternatively, the 

mutations can be effected using a mismatched primer (generally 10-20 nucleotides in length) 

which hybridizes to the native nucleotide sequence, at a temperature below the melting 

temperature of the mismatched duplex. The primer can be made specific by keeping primer 

length and base composition within relatively narrow limits and by keeping the mutant base 
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centrally located (Zoller and Smith (1983) Meth, Enzymol. 100: 468). Primer extension is 
effected using DNA polymerase, the product cloned and clones containing the mutated 
DNA, derived by segregation of the primer extended strand, selected. Selection can be 
accomplished using the mutant primer as a hybridization probe. The technique is also 
5 applicable for generating multiple point mutations {see, e.g., Dalbie-McFarland et al. (1982) 
Proc. Natl. Acad. Sci USA 79:6409). PCR mutagenesis will also find use for effecting the 
desired mutations. 

O Random modification of PKS/NRPS pathways. 

In another embodiment, variations can be made randomly, for example by 

10 making a library of molecular variants of a known polymer by randomly mutating one or 
more PKS or NRPS modules and/or enzymatic domains or by randomly replacing one or 
more modules or enzymatic domains in a known PKS or NRPS with a collection of 
alternative modules and/or enzymatic domains.. 

The PKS and/or NRPS modules can be combined into a single multi-modular 

15 enzyme, thereby dramatically increasing the number of possible combinations obtained using 
these methods. These combinations can be made using standard recombinant or nucleic acid 
amplification methods, for example by shuffling nucleic acid sequences encoding various 
modules or enzymatic domains to create novel arrangements of the sequences, analogous to 
DNA shuffling methods described in Crameri et al, (1998) Nature 391: 288-291, and in U.S. 

20 Patents 5,605,793 and in 5,837,458. In addition, novel combinations can be made in vitro, 
for example by combinatorial synthetic methods. Novel polymers, or polymer libraries, can 
be screened for any specific activity using standard methods. 

Random mutagenesis of the nucleotide sequences obtained as described above 
can be accomplished by several different techniques known in the art, such as by altering 

25 sequences within restriction endonuclease sites, inserting an oligonucleotide linker randomly 
into a plasmid, by irradiation with X-rays or ultraviolet light, by incorporating incorrect 
nucleotides during in vitro DNA synthesis, by error-prone PCR mutagenesis, by preparing 
synthetic mutants or by damaging plasmid DNA in vitro with chemicals. Chemical mutagens 
include, for example, sodium bisulfite, nitrous acid, hydroxylamine, agents which damage or 

30 remove bases thereby preventing normal base-pairing such as hydrazine or formic acid, 

analogues of nucleotide precursors such as nitrosoguanidine, 5-bromouracil, 2-aminopurine, 
or acridine intercalating agents such as proflavine, acriflavine, quinacrine, and the like. 
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Generally, plasmid DNA or DNA fragments are treated with chemicals, transformed into E. 
coli and propagated as a pool or library of mutant plasmids. 

Large populations of random enzyme variants can be constructed in vivo 
using "recombination-enhanced mutagenesis." This method employs two or more pools of, 
5 for example, 10 6 mutants each of the wild-type encoding nucleotide sequence that are 

generated using any convenient mutagenesis technique, described more fully above, and then 
inserted into cloning vectors. 

D) Incorporation and/or modification of non-blm cluster elements. 

In either the directed or random approaches, nucleic acids encoding novel 

10 combinations of modules and/or enzymatic are introduced into a cell. In one embodiment, 
nucleic acids encoding one or more PKS or NRPS domains are introduced into a cell so as to 
replace one or more domains of an endogenous PKS or NRPS within a chromosome of the 
cell. Endogenous gene replacement can be accomplished using standard methods, such as 
homologous recombination. Nucleic acids encoding an entire PKS, NRPS, or combination 

15 thereof can also be introduced into a cell so as to enable the cell to produce the novel 

enzyme, and, consequently, synthesize the novel polymer. In a preferred embodiment, such 
nucleic acids are introduced into the cell optionally along with a number of additional genes, 
together called a 'gene cluster,' that influence the expression of the genes, survival of the 
expressing cells, etc. In a particularly preferred embodiment, such cells do not have any 

20 other PKS- or NRPS- encoding genes or gene clusters, thereby allowing the straightforward 
isolation of the polymer synthesized by the genes introduced into the cell. 

Furthermore, the recombinant vector(s) can include genes from a single PKS 
and/or NRPS gene cluster, or may comprise hybrid replacement PKS gene clusters with, e.g., 
a gene for one cluster replaced by the corresponding gene from another gene cluster. For 

25 example, it has been found that ACPs are readily interchangeable among different synthases 
without an effect on product structure. Furthermore, a given KR can recognize and reduce 
polyketide chains of different chain lengths. Accordingly, these genes are freely 
interchangeable in the constructs described herein. Thus, the replacement clusters of the 
present invention can be derived from any combination of PKS and/or NRPS gene sets that 

30 ultimately function to produce an identifiable polyketide and/or peptide. 

Examples of hybrid replacement clusters include, but are not limited to, 
clusters with genes derived from two or more of the act gene cluster, the whiE gene cluster, 
frenolicin (fren), granaticin (gra), tetracenomycin (tern), 6-methylsalicylic acid (6-msas), 




oxytetracycline (otc), tetracycline (tet), erythromycin (ery), griseusin (gris), nanaomycin, 
medermycin, daunorubicin, tylosin, carbomycin, spiramycin, avermectin, monensin, 
nonactin, curamycin, rifamycin and candicidin synthase gene clusters, among others. (For a 
discussion of various PKSs, see, e.g., Hopwood and Sherman (1990) Ann. Rev. Genet. 24: 
5 37-66; O'Hagan (1991) The Polyketide Metabolites, Ellis Horwood Limited. 

A number of hybrid gene clusters have been constructed, having components 
derived from the act,fren, tcm, gris andgra gene clusters {see, e.g., U.S. Patent 5,712,146). 
Other hybrid gene clusters, as described above, can easily be produced and screened using 
the disclosure herein, for the production of identifiable polyketides, polypeptides or 

10 polyketide/polypeptide hybrids. 

Host cells (e.g. Streptomyces) can be transformed with one or more vectors, 
collectively encoding a functional PKS/NRPS set (e.g. a bleomycin or bleomycin analog), or 
a cocktail comprising a random assortment of PKS and/or NRPS genes, modules, active 
sites, or portions thereof. The vector(s) can include native or hybrid combinations of PKS 

15 and/or NRPS subunits or cocktail components, or mutants thereof. As explained above, the 
gene cluster need not correspond to the complete native gene cluster but need only encode 
the necessary PKS and/or NRPS components to catalyze the production of the desired 
product. For example, in Streptomyces aromatic PKSs, carbon chain assembly requires the 
products of three open reading frames (ORFs). ORF1 encodes a ketosynthase (KS) and an 

20 acyltransferase (AT) active site (KS/AT); ORF2 encodes a chain length determining factor 
(CLF), a protein similar to the ORF1 product but lacking the KS and AT motifs; and ORF3 
encodes a discrete acyl carrier protein (ACP). Some gene clusters also code for a 
ketoreductase (KR) and a cyclase, involved in cyclization of the nascent polyketide 
backbone. However, it has been found that only the KS/AT, CLF, and ACP, need be present 

25 in order to produce an identifiable polyketide. Thus, in the case of aromatic PKSs derived 
from Streptomyces, these three genes, without the other components of the native clusters, 
can be included in one or more recombinant vectors, to constitute a "minimal" replacement 
PKS gene cluster. 

E) Variation of starter and extender units. 

30 In addition to varying the PKS and/or NRPS modules and/or domains, 

variations in the products produced by various PKS/NRPS systems can be obtained by 
varying the starter units and/or the extender units. Thus, for example, a considerable degree 
of variability exists for starter units, e.g., acetyl CoA, maloamyl CoA, propionyl CoA, 




acetate, butyrate, isobutyrate and the like. In addition, naturally occurring PKSs and/or 
NRPSs have shown some tolerance for varying extender units. 

F> Examples of preferred modifications. 

As indicated above, the novel PKS and NRPS modules and enzymatic 
5 domains identified herein can be used to perform specific single modifications of particular 
substrates, or as components of complex synthetic pathways to generate particular products 
or large combinatorial libraries. As described in the Examples, a number of modules of the 
blm gene cluster provide novel functionality. By way of example, a few preferred reactions 
are listed below. These examples are intended to be illustrative and are not exhaustive nor 
10 limiting. 

1. Use of BlmVIII PKS to introduce branch ed methyl group. 

The blmVIII gene identified herein encodes a PKS module consisting of 
domains characteristic for known PKSs, such as ketoacyl synthase (KS), acyltransferase 
(AT), ketoreductase (KR), and ACP, with malonyl CoA acting as an extending unit. 

1 5 However, the identification of an integrated methyltransferase (MT) domain in the middle of 
BlmVIII is unique, representing the first PKS from actinomycetes that contains an internal 
MT domain. The use of this methyltransferase domain allows the introduction of a branched 
methyl group during a polyketide and/or polypeptide and/or hybriding 
polyketide/polypeptide synthesis. Figure 5 illustrates the use of BlmVIII PKS in engineering 

20 a polyketide biosynthesis that introduces a branched methyl group. 

The first formula in Figure 5 illustrates a polyketide synthesis mediated by 6- 
deoxyerythronolide B synthase (DEBS) which normally catalyzes the biosynthesis of the 
erythromycin aglycone, 6-deoxyerythronolide B. The remaining formulas show how the use 
of the blmVIII methyltransferase (MT) group at different points in the synthesis results in the 

25 introduction of a methyl group at different locations in the resulting product. 

In view of this illustration, one of skill in the art would appreciate that the 
blmVIII MT domain can be used in a wide variety of biosyntheses to introduce methyl 
branches. 
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2. Use of the blm gene cluster to make thiazolidine., thiazoline, 
thiazole, bi-thiazolidine, bithiazoline, and bithiazole-containing 
compounds. 

The BlmlV and BlmlllNRPSs are characterized by unusual Cy domains as 
5 well an unprecedented Ox domain, providing an efficient biosynthesis for a bithiazole 

structure. While thiazoline is the direct product of the Cy domain, the thiazoline-to-thiazole 
conversion generally is performed with an additional oxidation step. We identified at the C- 
terminus of NRPS-0 an additional domain that shows low, but significant, sequence 
homology to a family of putative oxidases/dehydrogenases, including the McbC protein of 

10 the microcin B17 synthase (Table 1). Microcin B17 synthase catalyzes the synthesis of the 
oxazole and thiazole-containing peptide antibiotic microcin B 17, and McbC has been 
proposed to play a role in catalyzing the oxazoline/thiazoline-to-oxazole/thiazole conversion. 
Consequently, we propose that this extra domain at the C-terminus of NRPS-0 provides the 
oxidase/dehydrogenase activity for the biosynthesis of the bithiazole moiety of BLM, 

1 5 defining a novel Ox domain for NRPSs. 

It is noteworthy that a cell-free preparation from Sv ATCC15003 has been 
reported to catalyze the conversion of phleomycins to BLMs in the presence of NAD + , 
supporting the hypothesis that the bithiazole moiety of BLM results from stepwise 
oxidations of a bithiazoline precursor (Fig. 1 A). (The phleomycin producer could be 

20 imagined to result from the loss of its Ox activity for the first thiazoline ring.) Given the 
wide distribution of thiazole or oxazole rings in natural products exhibiting an impressive 
array of biological activities, the cloning of the blmlV, ///genes and the identification of the 
Ox domain open many opportunities thiazole biosynthesis and to synthesize novel thiazole 
containing molecules by engineering peptide biosynthesis. 

25 Representative thiazole syntheses using variants of the blm NRPS are 

illustrated in Figure 6. Note that in Figure 6, A M and A N refer to an A domain that activates 
and amino acid with R M and R N groups, respectively. A c refers to an A domain that 
activates Cys (x = SH) or Ser (X = OH) that can be cyclized to form the oxiaoline/thiazoline 
or oxazole/thiazole structures. DH is a dehydratase. In view of these representative 

30 examples, one of skill in the art would appreciate that the blm NRPS domain and its variants 
can be used in a wide variety of chemical syntheses make thiazolidine, thiazoline, thiazole, 
bi-thiazolidine, bithiazoline, or bithiazole-containing compounds. 
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3. Use of the blm gene cluster to make heterocyclic ring-containing 
compounds. 

Various blm modules can be used to produce heterocyclic ring-containing 
compounds. Such heterocycles include, but are not limited to five member S- and N- 
containg compounds of the thiazolidine, thiazoline and thiazole family or the O- and N- 
containing compounds of the oxazolidine, oxazoline, and oxazole family. Again, the 
preparation of such compounds is illustrated in Figure 6. 

4. Use of the blm gene cluster to make sugars. 

In still another embodiment, the blm gene cluster or elements thereof can be 
used to make sugars. Such sugars include, but are not limited to L-sugars (with the BlmG 
epimerase), sugars modified by a carbamoyl group (e.g., using BlmD), and various 
disaccharides. Representative examples of such syntheses are illustrated in Figure 7. Such 
sugar biosynthesis genes can also e used to attach sugars onto other polyketide and/or 
peptide agly cones. 

F) Screening of products. 

Particularly where large combinatorial libraries are synthesized, e.g. using one 
or more modules and/or enzymatic domains of the blm gene cluster it will often be desired to 
screen the resulting compound(s) for the desired activity. Mehtods of screening compounds 
(e.g. polypeptides, polyketides, sugars, thiazoles, etc.) for various activities of interest (e.g. 
cytotoxicity, antimicrobial activity, particular chemical activities, etc.) are well known to 
those of skill in the art. 

Where large numbers of compounds are produced, it is often desired to 
rapidly screen such compounds using "high throughput systems" (HTS). High throughput 
assays systems are well known to those of skill in the art and many such systems are 
commercially available, (see, e.g., Zymark Corp., Hopkinton, MA; Air Technical Industries, 
Mentor, OH; Beckman Instruments, Inc. Fullerton, CA; Precision Systems, Inc., Natick, 
MA, etc.). These systems typically automate entire procedures including all sample and 
reagent pipetting, liquid dispensing, timed incubations, and final readings of the microplate 
in detector(s) appropriate for the assay. These configurable systems provide high 
throughputand rapid start up as well as a high degree of flexibility and customization. The 
manufacturers of such systems typically provide detailed protocols for the various high 
throughput screens. 
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VII. In Vitro syntheses. 

In additional embodiments of this invention, bleomycins and other 
polyketides and/or polypeptides are synthesized and/or modified in vitro. Individual 
enzymatic domains or modules can be used in vitro to modify a unit and/or to add a single 
5 monomeric unit to a growing polyketide or polypeptide chain. In one approach a 

metasynthetase providing all the desired synthetic activities recombinantly expressed and 
then provided, the appropriate substrates and buffer system e.g. in a bioreactor, to direct the 
synthesis of the desired product. In another approach, various PKSs and/or NRPSs are 
provided in different solutions and the growing polymer chains can be sequentially 
10 introduced into the plurality of solutions, each containing a single (or several) PKS or NRPS 
modules. In still another embodiment, the PKS and/or NRPS modules or enzymatic domains 
are provided attached to a solid support and a fluid contgaining the growing macromolecule 
is passed over the surface whereby the PKSs or NRPSs are able to react with the target 
substrate. 

15 In one preferred embodiment, a combinatorial library of polyketides or 

polypeptides, or combinations thereof, is created by using automated means to facilitate the 
sequential introduction of a multitude of polymeric chains, each attached to a solid support, 
to a collection of solutions, each containing a single PKS or NRPS module. These 
automated means can be used to systematically vary the sequence by which each polymeric 

20 chain is introduced into the various solutions, thereby creating a combinatorial library. 

Numerous methods are well known in the art to create combinatorial libraries of molecules 
by the sequential addition of monomeric units, for example as described in WO 97/02358. 

VIII. Kits. 

In still another embodiment, this invention provides kits for practice of the 
25 methods described herein. In one preferred embodiment, the kits comprise one or more 
containers containing nucleic acids encoding one or more of the blm gene cluster ORFs 
and/or one or more of the BLM PKS or NRPS modules or enzymatic domains. Certain kits 
may comprise vectors encoding the blm orfs and/or cells containing such vectors. The kits 
may optionally include any reagents and/or apparatus to facilitate practice of the assays 
30 described herein. Such reagents include, but are not limited to buffers, labels, labeled 
antibodies, bioreactors, cells, etc. 

In addition, the kits may include instructional materials containing directions 
[i.e., protocols) for the practice of the methods of this invention. Preferred instructional 
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materials provide protocols utilizing the kit contents for creating or modifying blm module or 
ORF and/or for synthesizing or modifying a molecule using one or more blm modules and/or 
enzymatic domains. While the instructional materials typically comprise written or printed 
materials they are not limited to such. Any medium capable of storing such instructions and 
communicating them to an end user is contemplated by this invention. Such media include, 
but are not limited to electronic storage media (e.g., magnetic discs, tapes, cartridges, chips), 
optical media (e.g., CD ROM), and the like. Such media may include addresses to internet 
sites that provide such instructional materials. 

EXAMPLES 

The following examples are offered to illustrate, but not to limit the claimed 

invention. 

Example 1 

Bleomycin biosynthesis in Streptomyces verticillus ATCC15003, A model for hybrid 
peptide and polyketide biosynthesis. 

Here we report the cloning and characterization of the blm biosynthesis gene 
cluster from Sv ATCC 15003 (Fig. 2). Sequence analysis and biochemical characterization of 
individual modules enabled us to align the nine NRPS and one PKS modules in a linear order 
to constitute the Blm megasynthetase complex (Fig. IB). These studies revealed several 
unprecedented features for peptide and polyketide biosynthesis, setting the stage to 
investigate the molecular basis for intermodular communication between NRPS and PKS, 
and supported the wisdom of combining individual NRPS and PKS modules for 
combinatorial biosynthesis to make novel "unnatural" natural products from amino acids and 
short carboxylic acids. 

Materials and Methods. 

General procedures. 

Escherichia coli DH5a (Sambrook et al. (1989) Molecular Cloning: A 

Laboratory Manual, 2nd ed, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 

USA), E. coli XL 1-Blue MR (Stratagene, La Jolla, CA), E. coli BL21(DE-3) (Novagen, 

Madison, WI), and Sv ATCC 15003 (American Type Culture Collection, Rockville, MD) 

were used in this work. pOJ446 (Agricultural Research Service Culture Collection, Peoria, 

IL), pQE60 (Qiagen, Santa Clarita, CA), pET28a and pET29a (Novagen), and other plasmids 
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were from commercial sources. E. coli (Sambrook, supra.) and Sv ATCC15003 strains 
(Hopwood et al. (1985) Genetic Manipulation of Streptomyces: A Laboratory Manual, The 
John Innes Foundation, Norwich, UK) were cultured under standard conditions. 

Plasmid preparation was carried out by using commercial kits (Qiagen). Total 
5 .Sv ATCC 15003 DNA was isolated according to literature protocols (Hopwood et al. (1985) 
Genetic Manipulation of Streptomyces: A Laboratory Manual, The John Innes Foundation, 
Norwich, UK; Nagaraja et al. (1987) Methods Enzymol. 153: 166-198). Restriction enzymes 
and other molecular biology reagents were from commercial sources, and digestions and 
ligation followed standard methods (Sambrook, supra.). For Southern analysis, digoxigenin 
10 labelling of DNA probes, hybridization, and detection were performed according to the 

protocols provided by the manufacturer (Boehringer Mannheim Biochemicals, Indianapolis, 
IN). 

Automated DNA sequencing was carried out on an ABI Prism 377 DNA 
Sequencer (Perkin-Elmer/ABI, Foster City, CA), and this service was provided by either the 
15 DBS Automated DNA Sequencing Facility, UC Davis, or Davis Sequencing (Davis, CA). 
Data were analyzed by the ABI Prism Sequencing 2.1.1 software and the Genetics Computer 
Group (GCG) program (Madison, WI). 

Cloning and sequencing of the blm gene cluster. 

A genomic library of Sv ATCC 15003 was constructed in pOJ446 according to 
20 literature procedures (Nagaraja et al. ( 1 987) Methods Enzymol. 153: 166-198) and screened 
with probes made from both ends of the blmAB locus (Sugiyama et al. (1994) Gene 151: 11- 
16; Calcutt and Schmidt (1994) Gene 151: 17-21), leading to the localization of 140-kb 
contiguous DNA, of which 100-kb is upstream (Fig. 2) and 40-kb is downstream (data not 
shown) of the blmAB genes. Heterologous NRPS probes were amplified from Sv 
25 ATCC 15003 by polymerase chain reaction (PCR) according to literature procedures (Turgay 
and Marahiel (1994) Peptide Res. 7: 238-241) and used to screen the entire 140-kb DNA by 
Southern analysis under various hybridization conditions (Shen et al. (1999) Bioorg. Chem. 
27: 155-171). 

Prediction of substrate specificity of NRPSs. 

30 The nine Blm NRPS modules were compared with eighty four modules from 

various bacterial and fungal NRPSs available at the GenBank, including those with known or 
putative specificity for amino acids present in BLM. A table of overall similarities/identities 




was generated by PILEUP analysis of the A3 to A6 regions, and the residues lining the 
substrate binding pocket by comparison with PheA (Conti et al (1997) EMBO J. 1 6, 4174- 
4183) were determined by PILEUP/PRETTY analysis. The percentage similarities for each 
Blm NRPS module were plotted against the rest of the NRPS modules to display the overall 
5 sequence homology between the A3 to A6 region. Those modules that showed significantly 
higher homology were selected to compare the amino acid residues that line the substrate 
binding pocket. 

Overproduction and biochemical characterization of the NRPS-1A and NRPS- 
6A proteins. 

10 Heterologous expression of the A domain in E. coli were performed according 

to literature procedures (Mootz and Marahiel (1997) J. Bacteriol. 179: 6843-6850). NRPS- 
1A (forward primer 5'-AAC CCA TGG CTG CTT CCC TGA CCC GCC TGG CC-3', SEQ 
ID NO:76, and reverse primer 5'-CCT AGA TCT ACG GGC AGG TGG GGC GGT-3', 
SEQ ID NO:77) and NRPS-6A (forward primer 5'-GGG AAT TCC ATA TGA TCC TCA 

15 CGT CCT TCC AC-3', SEQ ID NO:78, and reverse primer 5'-GGC AAG CTT GGG TGA 
GGG TCC GTT CGG T-3\ SEQ ID NO:79) were amplified by PCR from Sv ATCC15003 
cosmid clones. The resulting 1.6-kb fragment of NRPS-1A was first cloned into the 
NcollBgM sites of pQE60 and then moved as an NcoUHindill fragment into the similar sites 
of pET29a to yield pBSlO, and the resulting 1.6-kb fragment of NRPS-6A was directly 

20 cloned into the Ndel/Hindlll sites of pET28a to yield pBSl 1. Introduction of pBSlO and 

pBSl 1 into E. coli BL21(DE-3) under standard expression conditions resulted in production 
of NRPS- 1 A (with an N-terminal S-tag and a C-terminal His 6 -tag) and NRPS-6A (with an N- 
terminal His 6 -tag), respectively. The soluble fractions of fusion proteins were subjected 
sequentially to an affinity chromatography on Ni-NTA resin and an anion exchange 

25 chromatography on a Hyper-D column (PerSeptive Biosystem, Framingham, MA), resulting 
in NRPS- 1 A and NRPS-6A with near homogeneity. 

Results and Discussion. 



Cloning of the blm gene cluster from ATCC15003. 

Davies and co-workers previously cloned two BLM resistance genes (blmA 
30 and blmB) from SV ATCC15003 (Sugiyama et al. (1994) Gene 151: 11-16), and Calcutt and 
Schmidt (1994) Gene, 151:17-21, sequenced a 7.2-kb DNA fragment flanking the blmAB 
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genes, revealing seven open reading frames (orfs), none of which were found to encode Blm 
NRPS or PKS enzymes. Given the precedent that antibiotic production genes commonly 
occur as a cluster in actinomycetes, we adopted an approach combining chromosomal 
walking from the blmAB resistance locus and DNA hybridization with heterologous NRPS 
5 probes to clone and identify the blm cluster, leading to the localization of 140-kb contiguous 
Sv ATCC 15003 DNA. DNA sequencing of approximately 90-kb of the blm gene cluster, 
including the 7.2-kb blmAB locus, revealed 40 ORFs (Fig. 2). Preliminary functional 
assignments were made by comparison of the deduced gene products with proteins of known 
functions in the database. Among the ORFs identified from the blm cluster, we indeed found 

10 a PKS module, flanked by several NRPS modules-a fact that supports the hybrid 

NRPS/PKS/NRPS hypothesis for BLM biosynthesis-along with several sugar biosynthesis 
genes and genes encoding other biosynthesis enzymes as well as several resistance and 
regulatory genes (Table 1). 

Noteworthy are the genes encoding the putative NRPS and PKS enzymes. 

15 The blml, blmll, and blmXI genes encode NRPSs with an unusual architecture. In contrast to 
all known NRPSs, which are of modular organization with each module consisting 
minimally of a condensation (C), an adenylation (A), and a peptidyl carrier protein (PCP) 
domain (1), Blml, Blmll, and BlmXI are discrete proteins homologous to individual domains 
of type I NRPSs. We have characterized Blml as a type II PCP (18). The Blmll and BlmXI 

20 proteins could serve as candidates for type II condensation enzymes. It is unclear yet what 
role if any these discrete NRPS enzymes could play in BLM biosynthesis. 

The blmlll, blmlV, blmV, blmVI, blmVII, blmlX, and blmX genes encode 
modular NRPSs consisting of domains characteristic for known type I NRPSs (A special 
thematic issue on polyketide and nonribosomal polypeptide biosynthesis, (1997) Chem. Rev. 

25 97: 2463-2706), such as the A, PCP, C, and condensation/cyclization (Cy) domains (Konz et 
al. (1997) Chem. Biol. 4: 927-937), as well as an unprecedented oxidation (Ox) domain (see 
discussion below). However, BlmVI is unique among all the Blm NRPSs identified. Its N- 
terminal module (NRPS-5) consists of an atypical A domain, which bears a close 
resemblance to a family of acyl CoA synthases (Fitzmaurice and Kolattukudy (1997) J. 

30 Bacteriol. 179: 2608-2615; Fitzmaurice and Kolattukudy (1998) J. Biol. Chem. 273: 8033- 
8039), and an acyl carrier protein (ACP)-like domain (A special thematic issue on polyketide 
and nonribosomal polypeptide biosynthesis, (1997) Chem. Rev. 97: 2463-2706). Its C- 
terminal module is truncated and presumably interacts with BlmV to constitute the complete 
NRPS-3 module (Fig. IB). Also noteworthy are the C domain of NRPS-3 that lacks both 




His residues of the conserved HHxxxDG (SEQ ID NO:4) active site for transpeptidation 
(Stachelhaus et al. (1998) J. Biol. Chem., 273: 22773-22781) and the extra C domain at the 
C-terminus of BlmV. These unusual features associated with Blm VI and BlmV m&y play 
roles in the formation of the P-aminoalaninamide and the pyrimidine moieties of BLM, 
5 which are unprecedented in peptide biosynthesis. For example, we propose that the NRPS- 
4-activated Ser is first dehydrated into dehydroalanine before condensation-an analogous 
Thr-to-2,3-dehydroaminobutyric acid dehydration has been observed in syringomycin 
biosynthesis (Guenzi et al. (1998) J. Biol. Chem. 273: 32857-32863). Conjugate addition to 
dehydroalanine by Asn on the NRPS-3 module downstream followed by an aminolysis to 

10 cleave the Ser- Asn adduct off the Blm megasynthetase furnishes the p-aminoalaninamide 
moiety (Fig. IB). The former reaction could be catalyzed by the C domain of NRPS-3 that 
apparently is nonfunctional for normal transpeptidation due to the lack of the active sites, 
and the latter reaction could be catalyzed by the acyl CoA synthase-like domain of NRPS-5 
in a process that resembles the acyl CoA synthase-catalyzed synthesis of acyl CoA from 

15 carboxylic acid (Stachelhaus et al. (1998) J. Biol. Chem. 273: 22773-22781; Guenzi et al. 

(1998) J. Biol. Chem. 273: 32857-32863) but in the reverse direction in the presence of an 
amino donor (Fig. IB). 

The blmVIII gene encodes a PKS module consisting of domains characteristic 
for known PKSs, such as ketoacyl synthase (KS), acyltransferase (AT), ketoreductase (KR), 

20 and ACP, with malonyl CoA acting as an extending unit according to sequence comparison 
of the AT domain (Haydock et al. (1995) FEB S Lett. 374: 246-248) (Fig. IB). However, the 
identification of an integrated methyltransferase (MT) domain (Kagan and Clarke (1994) 
Arch. Biochem. Biophys. 310: 417-427) in the middle of BlmVIII is unique, representing the 
first PKS from actinomycetes that contains an internal MT domain. The only other example 

25 of PKS from bacteria that contains an internal MT domain is HMWP1 of the yersiniabactin 
gene cluster (Pelludat et al. (1998) J. Bacteriol. 180: 538-546). It has been assumed that 
fungal PKSs in general contain internal MTs for the introduction of methyl branch into the 
polyketide products, as it has been shown recently in lovastatin biosynthesis (Kennedy et al. 

(1999) Science 284: 1368-1372). 

30 The Blm megasvnthetase-templated assembly of BLM. 

According to the hybrid NRPS/PKS/NRPS model for BLM biosynthesis (Fig. 
1 A), we predict a linear modular organization of individual NRPS and PKS modules to 
constitute the Blm megasynthetase. Thus, the first functional domain of the Blm 



megasynthetase should be a NRPS module that initiates BLM biosynthesis by activating L- 
Ser as an amino acylthioester to set the stage for transpeptidation. Chain elongation 
proceeds by sequential incorporation of L-Asn, L-Asn, L-His, and L-Ala, requiring four 
additional NRPS modules. In the next step, a malonate reacts with the resulting pentapeptide 
5 intermediate to form a p-ketothioester intermediate that is subsequently methylated at the op- 
position and reduced at the p-keto group. A PKS module presumably dictates all these 
biosynthetic events and interacts with the aligned NRPS module upstream to channel the 
growing peptide intermediate from an NRPS module to a PKS module. After one cycle of 
polyketide elongation, peptide elongation is resumed by incorporation of an L-Thr residue. 

10 This step is presumably catalyzed by an NRPS module that interacts with the upstream PKS 
module to channel the growing polyketide intermediate (as far as the active site is concerned) 
from a PKS module to an NRPS module. At this stage, methylation occurs at the pyrimidine 
moiety of the growing intermediate, presumably catalyzed by a discrete methyltransferase; 
chain elongation is continued by three additional NRPS modules that incorporate a P-Ala 

15 and two L-Cys molecules sequentially. Finally, the fully assembled BLM 

peptide/polyketide/peptide backbone is hydroxylated at the (3-position of the His residue, 
presumably by a discrete hydroxylase, and released from the Blm megasynthetase complex 
via nucleophilic substitution of the RCO-S-PCP species by a terminal amine to form the 
BLM aglycone. Intermediates after five of the nine proposed elongation steps were in fact 

20 isolated as P-3, P-3A, P-3K, P-4, P-5, P-5m, P-6m, and P-6mo (Takita and Muroka (1990) 
pages 289-309 in Biochemistry of Peptide Antibiotics: Recent Advances in the Biotechnology 
ofp-Lactams and Microbial Peptides, Kleinkauf, H. & von Dohren, H. eds., W. de Gruyter, 
N.Y.), which presumably resulted from premature departure from the Blm megasynthetase 
complex before the chain reaches its full length (Fig. IB). 

25 Most of the bacterial NRPS gene clusters characterized to date are organized 

in operon-type structures, encoding multimodular NRPS proteins with individual modules 
organized along the chromosome in a linear order that parallels the order of the amino acids 
in the resultant peptides, i.e., following the "colinearity rule" for the NRPS-templated 
assembly of peptides from amino acids (A special thematic issue on polyketide and 

30 nonribosomal polypeptide biosynthesis, (1997) Chem. Rev. 97: 2463-2706; Cane et al. 

(1998) Science 282: 63-68). Inspection of the blm gene cluster (Fig. 2) showed that the Blm 
NRPS and PKS modules apparently are not organized according to the "colinearity rule" for 
BLM biosynthesis (Fig. 1). [Exception to the "colinearity rule" was also noted in the 
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syringomycin synthetase gene cluster (Guenzi et al (1998) J. Biol. Chem. 273: 32857- 
32863), and in fact, Grandi and co-workers have demonstrated recently in Bacillus subtilis 
that neither the operon-type structure nor the physical linkage of individual modules is 
essential for proper assembly and activity of the surfactin NRPS megasynthetase (Guenzi et 
5 al. (1998) J. Biol. Chem. 273: 14403-14410).] Realizing that the BLM biosynthesis cannot 
be rationalized according to the "colinearity rule", we determined the substrate specificity of 
individual NRPS and PKS modules in an attempt to shed light on the modular organization 
of the Blm megasynthetase complex. Brick and co-workers postulated, based on the X-ray 
structural analysis of the A domain of Grs A, PheA, that the region between core sequences 

10 A3 to A6 represent the amino acid specificity determinant of an NRPS module (Conti et al. 
(1997) EMBO J. 16: 4174-4183). Since the A domains in all known NRPSs share a 
significant sequence identity (ensuring that the main chain conformation of the enzymes is 
likely to be very similar), they further proposed that the differing substrate specificity of 
individual NRPS modules will be mainly determined by the nature of the amino acids lining 

15 the substrate binding pocket (Stachelhaus et al (1999) Chem. Biol 6: 493-505; Conti et al. 
(1997) EMBO J. 16: 4174-4183). Given this structural information and the vast amount of 
NRPS sequences available at the GenBank, we developed a novel approach for predicting 
substrate specificity for NRPS modules by comparing the overall sequence between the A3 
to A6 region and the eight amino acid residues that line up the substrate binding pocket. 

20 While a constant level of similarities (30%-40%) was evident among all the NRPS modules 
analyzed, most of the Blm NRPS modules showed striking similarities (50%-60%) to a 
particular cluster of NRPS modules as exemplified in Fig. 3 A for NRPS-1 and NRPS-6. 
Close examination of these modules clustered with higher similarities revealed that they 
activate the same or very similar amino acid, based on which the putative substrate for the 

25 NRPS in query could be predicted, i.e., NRPS-1 and NRPS -6 A activate L-Cys and L-Thr, 
respectively. These predictions were further supported by comparing the residues lining the 
substrate binding pocket. For example, the amino acid residues lining the substrate binding 
pocket for NRPS-1 and NRPS-6 are almost identical to those NRPS modules that are known 
to activate L-Cys and L-Thr, respectively, as shown in Fig. 3B. To verify the predicted 

30 amino acid specificity, we overproduced and purified the NRPS-1 A and NRPS-6A proteins 
(Fig. 3C) and examined their substrate specificity according to the amino acid-dependent 
ATP-PPi assay (Lee et al. (1970 Meth. Enzymol, 43: 585-602; Ku et al. (1997) Chem. & 
Biol, 4: 203-207). NRPS-1 A and NRPS-6 A indeed activate specifically L-Cys and L-Thr, 
respectively, among the amino acids tested (Fig. 3D). The latter results greatly enhanced our 



confidence in predicting the substrate specificity of a NRPS module by the above method. 
We subsequently determined the substrate specificity for all the NRPS modules identified 
from the blm gene cluster and they in fact accounted for all nine amino acids required for 
BLM biosynthesis (Fig. 2). 
5 Using the substrate specificity of individual NRPS and PKS modules as a 

guide, we can align the nine NRPS and one PKS modules to constitute the Blm 
megasynthetase as shown in Fig. IB according to our hybrid NRPS/PKS/NRPS model for 
BLM biosynthesis (Fig. 1 A). Among all the PKSs or NRPS systems examined so far, the 
Blm megasynthetase consists of the largest number of individual proteins. The precise 

1 0 interactions among all the Blm NRPS and Blm PKS proteins to constitute the Blm 
megasynthetase complex, therefore, reflect a remarkable power of protein-protein 
recognition (Guenzi et al. (1998) J. Biol. Chem. 273: 14403-14410; Gokhale et al. (1999) 
Science 284: 482-485). Although we are yet to provide direct evidence supporting the 
specific protein-protein interactions between the neighboring proteins, it is striking to note 

15 that all the biosynthetic intermediates isolated are derailed from either PKS or NRPS 

modules at the junctions between the interacting proteins (Fig. IB). Since it is not difficult 
to imagine that an intermediate is more likely to fall off the enzyme complex when it is 
subjected to interpeptide transfer than to intrapeptide transfer, we view the latter observation 
as strong evidence supporting the current model of the Blm megasynthetase 

20 BlmlX/BlmVIII/BlmVII as a hybrid NRPS/PKS/NRPS model. \ 

Recent biosynthetic studies on rapamycin in Streptomyces hygroscopicus 
(Konig et al. (1997) Eur. J. Biochem. 247: 526-534), yersiniabactin in Yersinia 
enterocolitica and Y. pestis (Pelludat et al. (1998) J. Bacteriol. 180: 538-546; Gehring et al. 
(1998) Chem. Biol. 5: 573-586; Gehring et al. (1998) Biochemistry 37: 1 1637-11650) and 

25 TA in Myxococcus xanthus (Paitan et al. (1999) J. Mol. Biol. 286, 465-474) are starting to 
shed light on hybrid peptide and polyketide biosynthesis. Two models are emerging for the 
alignment between a NRPS and a PKS module. The interacting NRPS and PKS modules 
could be either covalently linked by arranging all domains in a linear order on the same 
protein (Pelludat et al. (1998) J. Bacteriol. 180: 538-546; Gehring et al. (1998) Chem. Biol. 

30 5: 573-586; Gehring et al. (1998) Biochemistry 37: 1 1637-1 1650; Paitan et al. (1999) J. Mol. 
Biol. 286: 465-474) or physically located on two separate proteins, requiring specific protein- 
protein recognition to ensure the correct pairing between the interacting modules (Pelludat et 
al. (1998) J. Bacteriol. 180: 538-546; Konig et al. (1997) Eur. J. Biochem. 247: 526-534; 



Gehring et al. (1998) Chem. Biol. 5: 573-586; Gehring et al. (1998) Biochemistry 37: 1 1637- 
1 1650). Common to all these systems, however, are the unusual features associated with the 
interacting modules, such as the lack of the AT domain of the PKS module in Tal (Paitan et 
al. (1999) J. Mol. Biol. 286: 465-474) and the lack of the A domain and the presence of the 
5 Cy domain of the NRPS modules in both HMWP1 and HMWP2 (Pelludat et al. (1998) J. 
Bacteriol. 180: 538-5461; Gehring et al. (1998) Chem. Biol. 5: 573-586; Gehring et al. 
(1998) Biochemistry 37: 1 1637-1 1650). While extremely intriguing, the latter features 
complicate mechanistic analysis of these systems, making them less ideal candidates for 
studying how NRPS and PKS integrate into a productive hybrid NRPS/PKS complex. 

1 0 The BlmlXiBlm VIII/Blm VII system combines the features of both hybrid 

NRPS/PKS and PKS/NRPS systems, serving as an ideal model for studying hybrid peptide 
and polyketide biosynthesis. The fact that both the BlmlX and Blm VII NRPS modules and 
the BlmVIII PKS module themselves are three separate proteins with a typical domain 
organization for NRPS and PKS enzymes greatly simplifies the mechanistic analysis of the 

15 hybrid NRPS/PKS/NRPS complex. We have found that the KS domain of BlmVIII is more 
similar to the KSs of HMWP1 (Pelludat et al. (1998) J. Bacteriol. 180: 538-546) and Tal 
(Paitan et al. (1999) J. Mol. Biol. 286: 465-474), both of which catalyze the elongation of a 
peptidyl intermediate with a malonate, than to KSs of type I PKSs. We attribute these subtle 
differences to their unique reactivity that catalyzes the transfer of the peptidyl intermediate 

20 from the PCP to the KS domain, which presumably takes place prior to chain elongation 
(Fig.4). Subsequent condensation catalyzed by the KS domain between the peptidyl 
intermediate and malonyl-S-ACP results in the elongation of the growing peptide with a 
carboxylic acid. Equally striking are the discoveries that the ACP domain of BlmVIII is 
more similar to a PCP than to an ACP and that the C domain of BlmVII has an additional N- 

25 terminal segment of about 50 amino acids that is rich in arginine, aspartic acid, and glutamic 
acid. The latter feature is analogous to the N-terminal interpolypeptide linker for type I PKS, 
which has recently been demonstrated to play a critical role in intermodular communication 
(Gokhale et al. (1999) Science 284: 482-485). We propose that these unique features of the 
ACP domain from the BlmVIII PKS module and the C domain from the BlmVII NRPS 

30 module provide the molecular basis for the C domain to recognize the acyl-S-ACP as a 
substrate. Subsequent condensation catalyzed by the C domain between acyl-S-ACP and 
amino acyl-S-PCP results in the elongation of the growing polyketide (as far as this 
condensation is concerned) with an amino acid (Fig. 4). 
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Novel domains for the Blm NRPS and PKS modules. 

Various NRPS and PKS domains have been characterized, which are the 
building blocks for the entire field of combinatorial biosynthesis. The success for 
combinatorial biosynthesis depends critically upon the repertoire of these individual 
5 domains. Genetic analysis of the blm gene cluster has uncovered several novel NRPS and 
PKS domains. Without being bound to a particular theory, it is believed that Blm VI and 
BlmVaxe involved in the biosynthesis of the |3-aminoalaninamide and pyrimidine moieties of 
BLM). In addition, the MT domain in Blm VIII, the Cy domains in BlmlV, and the Ox 
domain in Blmlll are novel domains. 

10 The Blm VIII PKS module apparently furnishes the "propionate" unit into 

BLM in two steps by evolving a malonyl CoA-specifying AT domain coupled with a novel 
S-adenosylmethionine-requiring MT domain, representing a new mechanism to introduce 
methyl branches into polyketides (Fig. 4). This biosynthetic reaction sequence is 
unprecedented for polyketide biosynthesis since all PKSs from actinomycetes examined to 

15 date incorporate the alkyl branches into the resultant polyketides by selecting various alkyl 
malonates as the extending units that are determined by the AT domains. Yet, feeding 
experiments have unambiguously established that the polyketide moiety of BLM was 
derived from an acetate and a methionine (Takita and Muroka (1990) pages 289-309 in 
Biochemistry of Peptide Antibiotics: Recent Advances in the Biotechnology of fi-Lactams and 

20 Microbial Peptides, Kleinkauf, H. & von Dohren, H. eds., W. de Gruyter, N.Y.), a fact that 
fits well with the observed unusual domain organization of the Blm VIII PKS module (Fig. 
4). It is conceivable that the combination of this MT domain with an AT domain specific for 
a methyl malonate extending unit (Haydock et al. (1995) FEBS Lett. 374: 246-248) could 
result in the synthesis of polyketides with a gem-dimethyl moiety via engineering polyketide 

25 biosynthesis. Such a gem-dimethyl group has been found to be a very important 

pharmacophore for the epothilones, a family of hybrid peptide and polyketide metabolites 
that exhibits a remarkable antitumor activity similar to taxol (Ojima et alo. (1999) Proc. 
Natl. Acad. Sci. USA 96: 4256-4261). 

The BlmlV and Blmlll NRPSs are characterized by the unusual Cy domains 

30 as well as the unprecedented Ox domain, providing an efficient biosynthesis for a bithiazole 
structure. The Cy domain was first defined by Marahiel and co-workers in their study of 
bacitracin biosynthesis in B. licheniformis (Konz et al. (1997) Chem. Biol. 4: 927-937), and 
the Cy activity was demonstrated recently by Walsh and co-workers in their study of the 
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HMWP1 and HMWP2 proteins for yersiniabactin biosynthesis in Y. pestis (Gehring et al. 
(1998) Chem. Biol. 5: 573-586; Gehring et al. (1998) Biochemistry 37: 11637-11650). 
While thiazoline is the direct product of the Cy domain, the thiazoline-to-thiazole conversion 
requires an additional oxidation step. We identified at the C-terminus of NRPS-0 an 
5 additional domain that shows low, but significant, sequence homology to a family of putative 
oxidases/dehydrogenases, including the McbC protein of the microcin B17 synthase (Table 
1). Microcin B 17 synthase catalyzes the synthesis of the oxazole and thiazole-containing 
peptide antibiotic microcin B17, and McbC has been proposed to play a role in catalyzing the 
oxazoline/thiazoline-to-oxazole/thiazole conversion (Li et al. (1996) Science 274: 1188- 

10 1 193; Milne, et al. (1999) Biochemistry 38: 4768-4781). Consequently, we propose that this 
extra domain at the C-terminus of NRPS-0 could provide the oxidase/dehydrogenase activity 
needed for the biosynthesis of the bithiazole moiety of BLM, defining a novel Ox domain for 
NRPSs. It is noteworthy that a cell- free preparation from Sv ATCC15003 has been reported 
to catalyze the conversion of phleomycins to BLMs in the presence of NAD (Takita and 

1 5 Muroka (1 990) pages 289-309 in Biochemistry of Peptide Antibiotics: Recent Advances in 
the Biotechnology of ^-Lactams and Microbial Peptides, Kleinkauf, H. & von Dohren, H. 
eds., W. de Gruyter, N.Y.), supporting the hypothesis that the bithiazole moiety of BLM 
results from stepwise oxidations of a bithiazoline precursor (Fig. 1 A). (The phleomycin 
producer could be imagined to result from the loss of its Ox activity for the first thiazoline 

20 ring.) Given the wide distribution of thiazole or oxazole rings in natural products (Ojima et 
alo. (1999) Proc. Natl. Acad. Set USA 96: 4256-4261; Li et al. (1996) Science 274: 1 188- 
1 193) exhibiting an impressive array of biological activities, the cloning of the blmlVJII 
genes and the identification of the Ox domain open many opportunities to define the 
mechanism for thiazole biosynthesis and to potentially synthesize novel thiazole containing 

25 molecules by engineering peptide biosynthesis. 



54 




Example 2 

Identification and characterization of a type II peptidyl carrier protein from the 
bleomycin producer Streptomyces verticillus ATCC 15003. 

Results. 

5 Cloning and sequence analysis of the blml gene 

In our effort to clone the gene cluster responsible for BLM biosynthesis, we 
have determined 80 kb DNA sequence from Sv ATCC 15003 (Fig. 8). Among the orfs 
identified within the blm gene cluster is the small orf of 273 base pairs (bp), blml, which is 
located approximately 4 kb upstream of the previously characterized blmAB resistance locus 

10 (Sugiyama et al. (1994) Gene 151: 11-16; Calcutt and Schmidt (1994) Gene 151: 17-21) 
(Fig. 8B). The blml gene encodes a protein of 90 amino acids with a molecular weight of 
9957 and a pi of 6.52 (Fig. 8C). Computer-assisted analysis (Altschul et al. (1997) Nucleic 
Acids Res. 25: 3389-3402) of the deduced amino acid sequence indicates that Blml is very 
similar to various PCP domains of NRPSs (ranging around 40% identity and 60% similarity, 

15 as shown in Figure 9). Like known PCP domains of NRPS, Blml has the highly conserved 
signature motif of LGGXS, within which the serine residue is the site for 4'- 
phosphopantetheinylation (Stachelhaus and Marahiel (1995) FEMS Microbiol. Lett. 125: 3- 
14; Marahiel et al. (1997) Chem. Rev. 97: 2651-2673). The latter posttranslational 
modification is generally necessary for peptide biosynthesis; converting the apo-PCP into the 

20 functional holo-PCP (Marahiel et al. (1997) Chem. Rev. 97: 265 1-2673; Walsh et al. (1997) 
Curr. Opin. Chem. Biol. 1: 309-315). Based on sequence comparison, Blml is most related 
to PCPs and not to other kinds of carrier proteins that also share the same LGGXS (SEQ ID 
NO:80) motif and undergo the same posttranslational 4'-phosphopantetheinylation [31], such 
as the E. coli acyl carrier protein (ACP) (Lambalot and Walsh (1995) J. Biol. Chem. 270: 

25 24658-24661), the ACP domain of type I PKS and the type II PKS ACP (Cox and Simpson 
(1997) FEBSLett. 405: 267-272; Carreras et al. (1997) Biochemistry 36: 1 1757-1 1761), the 
ArCP domain (Gehring et al. (1998) Biochemistry 37: 2648-2659), and several nodulation 
related ACP-like proteins (Epple et al. (1998) J. Bacteriol. 180: 4950-4954; Spaink et al 
(1991) Nature 354: 125-130). 
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Overexpression of blml in E. coli 

To overexpress the blml gene in E. coli, we directly amplified the blml gene 
by PCR from the Sv. ATCC 15003 genomic DNA and cloned it into the pQE-60 expression 
vector to give pBSl so that Blml could be produced as a protein with a native N-terminus 
5 and a His6-tag at its C-terminus. However, no production of the Blml protein was detected, 
as judged by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), upon 
introduction of pBSl into E. coli M15(pREP4) under the standard overexpression conditions 
recommended by the manufacturer (Qiagen). We reasoned that the small Blml protein with 
its native N-terminus may not be stable in the heterologous host, and hence moved the blml 

10 gene from pBSl into pET-29a to yield the second overexpression construct of pBS2. In the 
latter construct, Blml should be produced as a fusion protein with 27 extra amino acid 
residues at its N-terminus, including an S-tag and the thrombin cleaving site, in addition to 
the His6-tag at its C-terminus. Introduction of pBS2 into E. coli BL21(DE-3) under the 
standard overexpression conditions recommended by the manufacturer (Novagen) indeed 

15 resulted in overproduction of Blml. In fact, the bulk of the soluble protein was the 

overproduced Blml, which was easily purified by affinity chromatography using Ni-NTA 
resin (Qiagen). It is noteworthy that fusion of the additional 23 amino acids to the N- 
terminus of Blml as in pBS2 and change of the expression system from E. coli M15(pREP4) 
(pBSl) to E. coli BL21(DE-3)(pBS2) dramatically improved the expression level of blml. 

20 In vivo 4'-phosphopantetheinylation of the Blml protein 

To establish Blml as a type II PCP, we tested if it could serve as a substrate 
for a PCP-specific 4'- PPTase. PPTases catalyze the posttranslational modification of an 
apo-PCP into a holo-PCP by transferring the 4'-phosphopantetheine moiety from co-enzyme 
A (CoA) to the conserved serine residue of PCP, and this reaction has been developed 

25 recently into a general method to prepare various holo-PCP, holo-ACP, or holo-ArCP from 
the corresponding apoproteins (Stachelhaus et al. (1996) Chem. Biol. 3: 913-9211; Gehring et 
al. (1998) Biochemistry 37: 2648-2659; Gehring et al. (1998) Biochemistry 37: 11637- 
1 1650; Weinreb et al. (1998) Biochemistry 37: 1575-1584 ). Therefore, we decided to 
investigate the 4'-phosphopantetheinylation of Blml under both in vivo (Ku et al. (1997) 

30 Chem. Biol. 4: 203-207) and in vitro (Gehring et al. (1998) Biochemistry 37: 1 1637-1 1650; 
Lambalot et al. (1996) Chem. Biol. 3: 923-936; Quadri et al. (1998) Biochemistry 37: 1585- 
1595) conditions. 
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To examine 4'-phosphopantetheinylation of Blml in vivo, we chose E. coli 
OG7001 as the expression host, which is a p-alanine auxotroph derived ixomE. coli 
BL21(DE3) by PI co-transduction of the panD mutation from E. coli SJ16 (Epple et at. 
(1998) J. Bacteriol. 180: 4950-4954). Upon introduction of pBS2 into E. coli OG7001, blml 
5 was exceptionally well expressed and the overproduced Blml protein was readily purified. 
However, high performance liquid chromatography (HPLC) analysis showed that the 
purified Blml was essentially in the apo-form (Fig. 10A), indicative that apo-Blml was a 
poor substrate for the E. coli endogenous PPTases, such as EntD and ACP synthase 
(Lambalot et al. (1996) Chem. Biol. 3: 923-936; Walsh et al. (1997) Curr. Opin. Chem. Biol. 

10 1: 309-3 15; Lambalot and Walsh (1995) J. Biol Chem. 270: 24658-24661). To circumvent 
the poor endogenous PPTase activity, we next co-expressed blml with the gsp gene, which 
was isolated from the gramicidin S producer Bacillus brevis, and encoded a PPTase that was 
known to 4'-phosphopantetheinylate heterologously produced PCPs in E. coli (Lambalot et 
al. (1996) Chem. Biol. 3: 923-936; Ku et al. (1997) Chem. Biol. 4: 203-207). We co- 

15 transformed pDPT-Gsp, in which the expression of the gsp gene was under the control of the 
T5/Lac promoter (Ku et al. (1997) Chem. Biol. 4: 203-207), and pBS2 into E. coli OG7001. 
Blml was again very well expressed and the resulting Blml protein was similarly purified. 
HPLC analysis showed that at least 60% of overproduced Blml was modified into the holo- 
Blml protein (Fig. 10B). (A PCP domain was similarly 4'-phosphopantetheinylated in vivo 

20 before by co-expressing gsp in E. coli using pDPT-Gsp, and approximately 80% of the PCP 
was produced in the holo-form (Ku et al. (1997) Chem. Biol. 4: 203-207). 

We next cultured E. coli OG7001(pBS2) and E. coli OG7001(pBS2/pDPT- 
Gsp) in the presence of [3 - 3 H]- p-alanine, a known biosynthetic precursor of 4'- 
phosphopantetheine (Stachelhaus et al. (1996) Chem. Biol. 3: 913-921; Epple et al. (1998) J. 

25 Bacteriol. 180: 4950-4954). Specific incorporation of [3- 3 H]-P-alanine into the 4'- 

phosphopantetheine moiety of holo-Blml was determined by autoradiographic analysis. 
Thus, while fermentation oiE. coli OG7001(pBS2) in the presence of [3 - 3 H]- p-alanine led 
to an IPTG-dependent overproduction of Blml, little of the resulting Blml protein was 3 H- 
labeled, indicative of being produced in the apo-form. In contrast, fermentation of E. coli 

30 OG700 1 (pBS2/pDPT-Gsp) in the presence of [3- 3 H]-p-alanine resulted in a significant 
increase of IPTG-dependent incorporation of the 3 H-label into the overproduced Blml 
protein, suggesting a specific incorporation of [3- 3 H]-P-alanine into holo-Blml, presumably 
in the 4'-phosphopanthetheine moiety. There were several additional proteins that were also 




weakly labeled by [3- 3 H]-p-alanine. However, both their expression and their incorporation 
by 3 H-label were independent from either IPTG induction or the presence of Gsp, hence 
these proteins were unrelated to Blml. (Similar background labeling was reported before for 
in vivo 4'-phosphopanthetheinylation of other PCP (Epple et al. (1998) /. Bacteriol. 180: 
5 4950-4954)). We also purified the Blml protein from E. coli OG7001(pBS2/pDPT-Gsp) and 
demonstrated that it was the holo-Blml protein that was specifically associated with the 3 H- 
activity. Finally, we confirmed the identity of holo-Blml by subjecting the purified Blml 
protein to MALDI-Tof mass spectral analysis (Weinreb et al. (1998) Biochemistry 37: 1575- 
1584). Blml produced in the absence of the Gsp PPTase yielded a single peak with a 

10 molecular weight of 13,952, suggesting that the produced Blml protein is in the apo-form 
(calc, 13,949). In contrast, Blml produced in the presence of Gsp yielded two species with 
molecular weight of 13,969 and 14,303, respectively. While the species with the molecular 
weight of 13,969 represents apo-Blml, a molecular weight of 14,303 unambiguously 
confirmed the other protein as holo-Blml (calc, 14,289). The latter result indicated that the 

15 purified Blml consisted of both the apo- and holo-Blml proteins, in agreement with the 
HPLC analysis results (Fig. 10B). 

In vitro 4'-phosphopantetheinvlation of the Blml protein 

To investigate 4'-phosphopantetheinylation of Blml in vitro, we chose the Sfp 
protein as the preferred PPTase, which had been isolated before from the surfactin producer 
20 Bacillus subtilis (Nakano et al. (1992) Mol. Gen. Genet. 232: 3 13-321). (Overexpression of 
gsp in E. coli using pDPT-Gsp resulted in predominantly an insoluble Gsp protein (Ku et al. 

(1997) Chem. Biol. 4: 203-207). The Sfp PPTase was overproduced in E. coli 

MV1 190(pUC8-Sfp) and purified to near homogeneity as described before (Quadri et al. 

(1998) Biochem., 37: 1585-1595; Nakano et al. (1992) Mol. Gen. Genet, 232: 313-321). 
25 Upon incubation of the purified apo-Blml with [ 3 H-pantetheine]-CoA in the presence of the 

Sfp PPTase, we examined the covalent incorporation of the [ 3 H-pantetheine]-4'- 
phosphopantetheine moiety from CoA into holo-Blml by autoradiographic analysis. Indeed, 
the apo-Blml was quantitatively labeled by [ 3 H-pantetheinej-CoA, and no labeling was 
observed in the absence of either the apo-Blml or the Sfp PPTase protein, demonstrating that 
30 the Sfp PPTase can recognize apo-Blml as a substrate and specifically transfer the 4'- 
phosphopantetheine group from CoA into holo-Blml. 
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In vitro aminoacylation of BIml 

Once we established Blml as a type II PCP that can be readily modified by 
PCP-specific PPTases into the holo-Blml protein, we tested if the holo-Blml could be 
aminoacylated in trans, requiring an A domain. Since Blml has no cognate A domain of its 
5 own, we turned our attention to another putative biosynthesis gene cluster we have cloned 
previously from Sv ATCC15003, which encodes at least four NRPS and one PKS modules. 
We have established that this gene cluster is not clustered with the blm locus and is unrelated 
to BLM biosynthesis. From this gene cluster, we amplified by PCR a 1579 bp fragment 
encoding an A domain, named Val-A, which we predicted to have a molecular weight of 

10 56,581 and a pi of 7.39. We cloned val-A into pET-28a to yield pBS3, in which Val-A 
would be produced as a fusion protein with a His6-tag at the N-terminus. Introduction of 
pBS3 into E. coli BL21(DE3) under the standard overexpression conditions recommended 
by the manufacturer (Novagen) resulted in good overproduction of Val-A, predominantly in 
soluble form, from which Val-A was purified by affinity chromatography using Ni-NTA 

15 resin. The purified Val-A protein was active by the amino acid-dependent ATP-PPi 

exchange assay (Lee and Lipmann (1970) Method Emzymol. 43: 585-602; Ku et al. (1997) 
Chem. Biol, 4: 203-207). Among the 23 amino acids tested, Val-A specifically activated 
valine, an amino acid that is not required for BLM biosynthesis. 

To carry out the aminoacylation in trans, we incubated the purified holo-Blml 

20 and Val-A in vitro in the presence L-[ 14 C(U)]valine and ATP (Stachelhaus et al. (1996) 
Chem. Biol. 3: 913-921; Weinreb et al. (1998) Biochemistry 37: 1575-1584). The 
aminoacylated holo-BlmI-Z,-[ 14 C(U)]valine species was subjected to SDS-PAGE and specific 
attachment of I-[ 14 C(U)] valine to holo-Blml was determined by autoradiographic analysis. 
Remarkably, the holo-Blml was specifically labeled by L-[ 14 C(U)] valine in the presence of 

25 Val-A, indicative of the formation of the holo-Blml-S-valine thioester. The in trans 

aminoacylation between the holo-Blml and Val-A proteins appeared to be very specific. 
Neither incubation of £-[ 14 C(U)]valine with Val-A, the apo-Blml, or the holo-Blml protein 
alone, nor incubation of Z-[ 14 C(U)]valine with the Val-A and apo-Blml proteins, resulted in 
the detection of 14 C-labeled Blml protein. 

30 Discussion. 

Nonribosomal peptides and polyketides are two distinct classes of natural 
products yet are assembled from amino acids and short carboxylic acids by NRPSs and 
PKSs, respectively, in strikingly similar strategies (Cane et al. (1998) Science 282: 63-68). 
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These fascinating multifunctional enzyme complexes have been classified into two types 
based on their gene organization and enzyme architecture. Type I enzymes are 
multifunctional proteins consisting of domains for individual enzyme activities, and type II 
enzymes are multienzyme complexes consisting of discrete proteins that are largely 
5 monofunctional. While both type I and type II PKSs (Fig. 1 1 A and 1 1C) have been well 
characterized to account for the vast structural diversities found in polyketide biosynthesis 
(Hopwood (1997) Chem Rev. 97: 2465-2497), all NRPSs studied so far are exclusively the 
type I modular enzymes (Fig. 1 IB) (Kleinkauf and von Dohren: H. (1996) Eur. J. Biochem. 
236: 335-351; Marahiel et al. (1997) Chem. Rev. 97: 2651-2673; von Dohren et al. (1997) 

10 Chem. Rev. 97: 2675-2705). It is very tempting to speculate the existence of a type II NRPS 
that, analogous to type II PKS (Shen and Hutchinson (1993) Science 262: 1535-1540; Bao et 
al. (1998) Biochemistry 37: 8132-8138; Carreras and Khosla (1998) Biochemistry 37: 2084- 
2088), should consist of discrete proteins possessing enzyme activities such as the A 
(Stachlhaus and Marahiel (1995) J. Biol. Chem. 270: 6163-6169), the PCP (Stein and Morris 

15 (1996) J. Biol. Chem. 271 : 15428-15435), or the C (Stachlhaus et al. (1998) J. Biol. Chem. 
273: 22773-22781) domains of type I NRPSs (Fig. 1 ID). The fact that both the A 
(Stachlhaus and Marahiel (1995) J. Biol. Chem. 270: 6163-6169; Konz et al. (1997) Chem. 
Biol. 4: 927-937; Weinreb et al. (1998) Biochemistry 37: 1575-1584; Mootz and Marahiel 
(1997) J. Bacteriol. 179: 6843-6850) and the PCP (Stachelhaus et al. (1996) Chem. Biol. 3: 

20 913-921; Weinreb et al. (1998) Biochemistry 37: 1575-15841; Pfeifer et al. (1995) 

Biochemistry 34: 7450-7459; Haese et al. (1994) J. Mol. Biol. 243: 1 16-122; Lambalot et al. 
(1996) Chem. Biol. 3: 923-936; Quadri et al. (1998) Biochemistry 37: 1585-1595; Gehring et 
al. (1996) Chem. Biol. 4: 17-24; Ku et al. (1997) Chem. Biol. 4: 203-207) domains of type I 
NRPSs can act as independent enzymes supports the hypothesis of a type II NRPS. 

25 We have now cloned and sequenced the blml gene, overproduced and 

characterized the Blml protein as a bona fide type II PCP, and demonstrated that holo-Blml 
can be aminoacylated by a completely unrelated A domain, providing for the first time 
genetic and biochemical evidence for a type II NRPS enzyme. We concluded Blml as a type 
II PCP based on the following criteria. (1) The deduced amino acid sequence of the blml 

30 gene is highly homologous to various PCP domains of known NRPSs, in particular at the 

signature motif of LGGXS within which the 4'-phosphopantetheine prosthetic group is 

covalently attached to the serine residue (Marahiel et al. (1997) Chem. Rev. 97: 2651-2673; 

Stachelhaus and Marahiel (1995) FEMS Microbiol. Lett. 125: 3-14). While the current 

boundaries for a PCP domain in the literature were defined arbitrarily (Stachelhaus et al. 
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(1996) Chem. Biol. 3: 913-921) and varied from one PCP to another, we can now re-define a 
PCP domain for the type I NRPS as a 90 amino acid peptide with approximately 45 amino 
acids, each flanking the essential serine residue in the LGGXS (SEQ ID NO:81) motif, in 
light of this discrete Blml type II PCP (Fig.9). (2) The blml gene has been successfully 
5 expressed in E. coli, and fusion of a short peptide to the N-terminus of Blml dramatically 
improved its overproduction efficiency. While we cannot exclude the effect of different 
systems on gene expression, i.e., E. coli M15(pREP4)(pBSl) vs. E. coli BL21(DE-3)(pBS2), 
we attribute the increase in expression efficiency to the stability of Blml as an N-terminal 
fusion protein instead of the otherwise labile Blml protein with its native N-terminus. Since 

10 Blml was produced predominantly in the apo-form in E. coli, apo-Blml apparently was not a 
substrate for the endogenous PPTases, such as EntD or ACP synthase, excluding Blml as an 
ArCP or ACP, respectively. EntD and ACP synthase are known to 4'- 
phosphopantetheinylate apo-ArCP and ACP, respectively, to their holo-forms efficiently 
(Lambalot et al. (1996) Chem. Biol. 3: 923-936; Walsh et al. (1997) Curr. Opin. Chem. Biol. 

15 1 : 309-3 15; Lambalot and Walsh (1995) J. Biol. Chem. 270: 24658-24661). (3) The apo- 
Blml protein serves as a substrate for PCP-specific PPTases that transfer the 4'- 
phosphopantetheine moiety from CoA to apo-Blml to yield the holo-Blml protein. We have 
demonstrated this posttranslational modification for Blml in vivo with the Gsp PPTase (Ku 
et al. (1997) Chem. Biol. 4: 203-207) and in vitro with the Sfp PPTase (Gehring et al. (1998) 

20 Biochemistry 37: 1 1637-1 1650; Lambalot et al. (1996) Chem. Biol. 3: 923-936; Quadri et al. 
(1998) Biochemistry 37: 1585-1595), both of which have been extensively used in preparing 
holo-PCPs. (4) The specific modification of apo-Blml by 4'-phosphopantetheinylation has 
been monitored by HPLC analysis (Fig. 10) (Weinreb et al, (1998) Biochemistry 37: 1575- 
1584) and by specific incorporation of [3- 3 H]-(3-alanine in vivo (Stachelhaus et al. (1996) 

25 Chem. Biol. 3: 913-921; Ku et al. (1997) Chem. Biol. 4: 203-207; Epple et al. (1998) J. 
Bacteriol. 180: 4950-4954) and of [ 3 H-pantetheine]-CoA in vitro (Gehring et al. (1998) 
Biochemistry 37: 1 1637-1 1650; Lambalot et al. (1996) Chem. Biol. 3: 923-936; Quadri et al. 
(1998) Biochemistry 37: 1585-1595), respectively, into the 4'-phosphopantetheine moiety of 
the holo-Blml protein. The identity of Blml was finally confirmed by MALDI-Tof mass 

30 spectral analysis that determined the molecular weight for both the apo- and holo-Blml 
proteins. 

While individual domains of type I NRPSs can function independently and 
several A (Stachlhaus and Marahiel (1995) J. Biol. Chem. 270: 6163-6169; Konz et al. 
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(1997) Chem. Biol. 4: 927-937; Weinreb et al. (1998) Biochemistry 37: 1575-1584; Mootz 
and Marahiel (1997) J. Bacteriol. 179: 6843-6850) and PCP (Stachelhaus et al. (1996) 
Chem. Biol. 3: 913-921; Weinreb et al. (1998) Biochemistry 37: 1575-15841; Pfeifer et al. 
(1995) Biochemistry 34: 7450-7459; Haese et al. (1994) J. Mol. Biol. 243: 1 16-122; 
5 Lambalot et al. (1996) Chem. Biol. 3: 923-936; Quadri et al. (1998) Biochemistry 37: 1585- 
1595; Gehring et al. (1996) Chem. Biol. 4: 17-24; Ku et al. (1997) Ckm. 5/0/. 4: 203-207) 
domains have been overproduced, purified, and biochemically characterized, aminoacylation 
in trans has been successful only between PCPs and their cognate A domains (Stachelhaus et 
al. (1996) Chem. Biol. 3: 913-921; Weinreb et al. (1998) Biochemistry 37: 1575-1584). No 

10 aminoacylation between PCP and A domains from different NRPS modules has been 
observed. These results led to the conclusion that there is a specific protein-protein 
recognition between the A domain and its cognate PCP (Weinreb et al. (1998) Biochemistry 
37: 1575-1584). Such domain- specific aminoacylation, in fact, should be beneficial in 
maintaining the fidelity of a type I NRPS by providing additional "gating" against 

15 misincorporation of non-specifically activated aminoacyl adenylate into the final peptide 
product. Since a type II PCP such as Blml lacks its cognate A domain, we asked if Blml 
could be aminoacylated by an unrelated A domain of a type I NRPS. Although we have yet 
to determine the biochemical role of Blml in vivo, the fact that the blml gene is located in the 
middle of the blm gene cluster suggests that it may be involved in BLM biosynthesis. To 

20 avoid the ambiguity of selecting an A domain that may potentially interact with Blml in 
vivo, we preferred not to choose any A domain from the blm gene cluster to test if it could 
aminoacylate Blml in trans. We reasoned that an A domain that is unrelated to Blml should 
come from a gene cluster independent from BLM biosynthesis and should activate an amino 
acid not required by BLM. We chose Val-A because it satisfied both requirements. Val-A is 

25 an A domain of a type I NRPS from a gene cluster we have cloned previously from Sv 
ATCC 15003 that has proven to be unrelated to BLM biosynthesis, and it specifically 
activates valine among the 23 amino acids tested. Remarkably, Blml was efficiently 
aminoacylated by Val-A. The valine residue is specifically attached in a thioester linkage to 
the terminal -SH of the 4'-phosphopantetheine moiety of the holo-Blml protein, as evidenced 

30 by the fact that the apo-Blml was inactive under the identical conditions. 

Aminoacylation of holo-Blml by Val-A represents the first example in which 
an A domain aminoacylates a protein other than its cognate PCP domain. Since it has been 
suggested that an A domain of a type I NRPS can transfer the activated aminoacyl adenylate 
only to its cognate PCP domain because of the specific protein-protein recognition between 




the two domains (Weinreb et al. (1998) Biochemistry 37: 1575-1584), the fact that Blml is 
aminoacylated by Val-A revealed a distinct feature of a type II PCP. It is very tempting to 
speculate that type II PCPs such as Blml may have broad intrinsic substrate specificity 
toward either the aminoacyl adenylate, the A domain, or both. In fact, the latter feature is 
5 reminiscent of the type II PKS ACPs, which have been shown to be interchangeable among 
different PKS complexes (Shen and Hutchinson (1993) Science 262: 1535-1540; Bao et al. 
(1998) Biochemistry 37: 8132-8138; Carreras and Khosla (1998) Biochemistry 37: 2084- 
2088). The biosynthesis of D-alanyl-lipoteichoic acid in Bacillus suntillis (Perego et al. 
(1995) J. Biol. Chem. 270: 15598-15606) and Lactobacillus casei (Debabov et al. (1996) 
10 178: 3869-3876) also involves a discrete ACP-like protein, the £>-alanyl carrier protein, 
although the latter clearly is structurally and functionally different from PCPs. 

The results strongly suggest the existence of a type II NRPS. In fact, we have 
already identified within the blm gene cluster two additional genes, blmll and blmXI (Fig. 
IB), which encode type II C proteins based on sequence analysis {see Example 1). 

15 Significance. 

All NRPSs known to date are exclusively the type I modular enzymes that are 
multifunctional proteins consisting of domains, such as A (Stachlhaus and Marahiel (1995) J. 
Biol. Chem. 270: 6163-6169), PCP (Stachelhaus et al. (1996) Chem. Biol. 3: 913-921), and C 
(Stachlhaus etal. (1998) J. Biol. Chem. 273: 22773-22781), for individual enzyme activities 

20 (Kleinkauf and von Dohren: H. (1996) Eur. J. Biochem. 236: 335-351; Marahiel et al. (1997) 
Chem. Rev. 97: 2651-2673; von Dohren et al. (1997) Chem. Rev. 97: 2675-2705), and 
control the structural variations of the resulting peptide products by the multiple-carrier 
thiotemplate mechanism (Cane et al. (1998) Science 282: 63-68; Stein and Morris (1996) J. 
Biol. Chem. 271: 15428-15435). While individual domains of type I NRPSs can function 

25 independently, aminoacylation in trans has been successful only between PCPs and their 

cognate A domains (Stachelhaus et al. (1996) Chem. Biol. 3: 913-921; Weinreb et al. (1998) 
Biochemistry 37: 1575-1584). We have cloned and sequenced the blml gene, overproduced 
and characterized the Blml protein as a bona fide type II PCP, and demonstrated that the 
holo-Blml can be aminoacylated by a completely unrelated A domain. Our results provided 

30 for the first time the genetic and biochemical evidence to support the hypothesis of a type II 
NRPS, setting the stage for formulating new research concepts to study peptide biosynthesis. 
Genetic manipulation of type I NRPS has already been successful in generating novel 
peptides (Stachlhaus et al. (1995) Science 269: 69-72). An unprecedented type II NRPS 
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should shed new light in engineering NRPS proteins, greatly increasing our ability to access 
peptides with even greater structural diversities. 

Materials and methods 

General DNA manipulations 

5 Plasmids preparation and DNA extraction were carried out by using 

commercial kits (Qiagen, Santa Clarita, CA), and all other manipulations were carried out 
according to standard methods (Sambrook et al. (1989) Molecular cloning: a laboratory 
manual: (2nd ed): Cold Spring Harbor Laboratory Press: Cold Spring Harbor: USA). E. coli 
strain DH5a was used as the host for general DNA propagations. 

10 Overexpression ofblml in E. coli and purification of the Blml protein 

The blml gene was amplified from Sv ATCC 15003 by PCR using a forward 
primer of 5'-CCG CCCATGGGT GCT CCG CGT GGC GAG CGG ACC CGG CGC-3' 
(SEQ ID NO:82, the Ncol site is underlined) and a reverse primer of 3'-CCT AGA TCT 
CCG GTC CCG CTC CCC CGT-5' (SEQ ID NO:83, the BgR\ site is underlined). In order 

15 to create the Ncol site, the original starting sequence of "ATG AGC" has been changed to 
"ATG GGT", which resulted in the change of the second amino acid from serine to glycine. 
The first five codons of blml were also optimized for overexpression in E. coli. The PCR- 
amplified 0.3 kb Ncol-Bglll fragment was cloned into the similar sites of pQE-60 (Qiagen) 
to form pBSl. Digestion of pBSl with jVcol and Hindlll and cloning the resulting 0.3 kb 

20 Ncol-Hindlll fragment into the same sites of pET-29a (Novagen, Madison, WI) yielded 
pBS2. 

Expressions of blml in E. coli M15 (pREP4)(pBSl) and in E. coli BL-21(DE- 
3)(pBS2) and purification of the resulting Blml protein by affinity chromatography on Ni- 

25 NTA resin were carried out under the standard conditions recommended by Qiagen and 
Novagen, respectively. The incubation temperature was lowered to 30 °C to improve the 
solubility. The purification of Blml was monitored by SDS-PAGE on 15% gel. The final 
pure Blml protein was desalted on PD-10 column (Sephadex G-25, Pharmacia Biotech, 
Piscataway, NJ) into 50 mM sodium phosphate buffer, pH 7.8, containing 200 mM NaCl, 10 

30 mM MgCl 2 , 2 mM dithiothreitol (DTT), 1 mM EDTA, 10% glycerol, and stored at - 80 °C 
for in vitro assays. 
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HPLC analysis and MALDI-Tof mass spectral determination 

Samples of Blml (30-70 ug) purified from E. coli OG7001(pBS2) or E. coli 
OG7001(pBS2/pDPT-Gsp) were analyzed on a Nova-Pak C18 column (5mm x 10, Waters, 
Milford, MA) using a Rainin DMAX HPLC unit. The column was developed by a linear 
gradient of 0-50% acetonitrile in 0.1% trifluoroacetic acid in 25 min, followed by additional 
5 min at 50 % acetonitrile, with a flow rate of 0.6 ml/min and detection at 280 nm. MALDI- 
Tof mass spectral determination was performed on a Bruker Biflex IIII spectrometer at the 
Facility for Advanced Instrumentation of University of California, Davis. 

In vivo labeling of Blml with [3- 3 Hl-B-alanine 

The p-alanine auxotroph E. coli strain OG7001 (Epple et al. (1998) J. 
Bacteriol. 180: 4950-4954) was transformed with pBS2 and cultured under the same 
conditions as for E. coli BL21(DE3) (Novagen). For co-expression of blml with gsp, pDPT- 
Gsp (Ku et al. (1997) Chem. Biol. 4: 203-207) was similarly transformed into E. coli 
OG7001(pBS2) and the transformants were cultured in 2xYT (Debabov et al. (1996) 178: 
3869-3876) in the presence of kanamycin (25 i-ig/ml) and chloramphenicol (50 u.g/ml). For 
in vivo labeling experiment, cells from 2 ml overnight culture of either E. coli 
OG7001(pBS2) or E. coli OG7001(pBS2/pDPT-Gsp) were harvested, washed with M9 
minimal medium (Debabov et al. (1996) 178: 3869-3876), and re-suspended in 2 ml of M9 
minimal medium. The latter were used as seed cultures (20 }^1) to inoculate 1 ml M9 
medium with kanamycin (25 ng/ml) or kanamycin (25 p.g/ml) and chloramphenicol (50 
ug/ml) for E. coli OG7001(pBS2) or E. coli OG7001(pBS2/pDPT-Gsp), respectively. The 
resulting culture was incubated at 30 °C, 250 rpm to OD 6 oonm 0.6 and to this was added 10 
uCi of [3- 3 H]-p-alanine (50 Ci/mmol, American Radiolabeled Chmicals Inc., St. Louis, MO) 
with or without IPTG (1 mM). Total proteins were resolved by SDS-PAGE on 15% gels 
that were Coomassie blue-stained. To determine 3 H-labeling of the overproduced holo-Blml 
protein, gels were soaked in Amplifier (Amersham, Arlington Heights, II) for 20 min, dried 
between two sheets of cellulose membrane (KOH Development Inc., Ann Arbor, MI), and 
visualized by autoradiography on X-ray films (Fuji Medical Systems, Stamford, CT). 

In vitro labeling of Blml with [ 3 H-pantetheine1-CoA 

Expression of sfp in E. coli MV1 190(pUC8-Sfp), purification of the Sfp 
PPTase to homogeneity, and 4'-phosphopantetheinylation of apo-Blml by Sfp in vitro were 
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carried out essentially according to literature procedures (Quadri et al. (1998) Biochemistry 
37: 1585-1595; Nakano et al. (1992) Mol. Gen. Genet. 232: 313-321). A typical 100 jal 
assay solution contained 26 uM apo-Blml, 2.9 (iM Sfp, 25 jjM [ 3 H-pantetheine]-CoA (0.9 
uCi, 40 Ci/mM), 10 mM MgCl 2 , and 5 raM DTT, in 75 mM MES/NaOAc buffer, pH 6.0. 
5 After 30 min incubation at 37 °C, the assays were stopped by addition of 5 ul of bovine 
serum albumin (0.2 mg/ml) and 0.9 ml of cold 10% (v/v) trichloroacetic acid (TCA). The 
precipitated proteins were collected by centrifugation at 14,000 rpm, 20 min, 4 °C 
(Eppendorf 5415C centrifuge), washed with 10% TCA three times, and resolved by SDS- 
PAGE on 15% gel. The 3 H-activity incorporated into holo-Blml was similarly determined 
10 by autoradiography as described for in vivo labeling of holo-Blm with [3- 3 H]-|3-alanme. 

Overexpression of val-A in E. coli and purification and assay of the Val-A 
protein 

The val-A fragment was amplified from Sv ATCC15003 by PCR using a 
forward primer of 5'-GGA ATT C CA TAT G GG CAC CAC CGT CGC CGC G-3' (SEQ ID 

15 NO:84, the Ndel site is underlined), and a reverse primer of 3'-GGC AAG CTT GGG ACC 
GGG CGT GGA GCG C (SEQ ID NO: 85, the Hindlll site is underlined). The PCR- 
amplified 1.6 kb Ndel-Hindlll fragment was cloned in the similar sites of pET-28a (Qiagen) 
to yield pBS3. Expression of val-A in E. coli BL-21(DE-3)(pBS3) and purification of the 
resulting Val-A protein by affinity chromatography on Ni-NTA resin were carried out under 

20 the standard conditions recommended by Novagen. 

Amino acid-dependent ATP-PPi assays were performed essentially according 
to the literature procedures (Ku et al. (1997) Chem. Biol. 4: 203-207; Lee and Lipmann 
(1970) Method Emzymol. 43: 585-602). A typical 100 pi assay solution contained 180 nM 
Val-A, 1 mM ATP, 0.1 mM PPi with 0.2 pCi of 32 P-PPi (1 1.75 Ci/mmol, NEN Life Science 

25 Products, Inc., Boston, MA), 1 mM MgCl 2 , 0. 1 mM EDTA, and 1 mM Z-amino acid in 50 
mM sodium phosphate buffer, pH 7.8. After 30 min incubation at 30°C, the assays were 
stopped by addition of 0.9 ml of cold 1% (w/v) activated charcoal in 3% (v/v) perchloric 
acid. The precipitates were collected on glass fiber filters (2.4 cm, G-4, Fisher, Pittsburgh, 
PA), washed successively with 10 ml of 0.2 M sodium phosphate buffer, pH 8.0, 4 ml water, 

30 and 1 ml of ethanol, and dried in air. The filters were mixed with 7 ml of scintillation fluid 
(ScintiSafe Gel, Fisher) and counted on a Beckman LS-6800 scintillation counter to 
determine the radioactivity. 




In vitro aminoacvlation of holo-Blml by Val-A 

The aminoacylation of holo-Blml was carried out essentially according to 
literature methods (Stachelhaus et al (1996) Chem. Biol. 3: 913-921; Weinreb et al. (1998) 
Biochemistry 37: 1575-1584). A typical 100 ul assay solution contained 180 nM Val-A, 1.5- 
5 2.8 pJVI apo- or holo-Blml, 35 uM L-[ 14 C(U)] -valine (283 mCi/mmol, NEN Life Science 
Products, Inc., Boston, MA), 5 mM ATP, 10 mM MgCl 2 , and 5 mM DTT in 75 mM Tris- 
HC1 buffer, pH 8.0. The reactions were started by the addition of ATP and, after incubation 
at 37 °C for 30 min, were stopped by addition of 0.9 ml of cold 7% (v/v) TCA. The 
precipitated proteins were collected by centrifugation at 14,000 rpm, 20 min, 4 °C 
1 0 (Eppendorf 541 5C centrifuge) and resolved by SDS-PAGE on a 1 5% gel. The radioactivity 
incorporated into the holo-BlmI-Z-[ 14 C(U)]valine species was similarly determined by 
autoradiography as described for in vivo labeling of holo-Blml with [3- 3 H]-p-alanine. 

Example 3: 

Cloning and characterization of a phosphopantetheinyl transferase from the 
15 bleomycin-producing Streptomyces verticillus ATCC15003 

Multienzymes complexes exist for acyl group activation and transfer reactions 
in the biogenesis of fatty acids, the polyketide family of natural products {e.g. erythromycin, 
tetracycline), and almost all non-ribosomal peptides (e.g. vancomycin, cyclosporin, 
penicillin). All of these complexes contain one or more small proteins, -80-100 amino acids 

20 long, either as separate subunits or as integrated domains, that function as carrier proteins for 
the growing acyl chain (acyl-, peptidyl-, and aryl- carrier proteins, abbreviated as ACP, PCP, 
and ArCP). They are converted from inactive apo-forms to functional holo-forms by the 
covalent attachment of the 4'-phosphopantetheine moiety of coenzyme A to a conserved 
serine residue of the carrier-protein substrate. This essential post-translational modification 

25 is catalyzed by a family of enzymes known as phosphopantetheinyl transferases (PPTases) 
(Lambalot et al. Chem. Biol. (1996) 3:923-936; Walsh et al. Curr. Opin. Chem. Biol. 
(1997) 1:309-315). 

Research in the field of polyketide and non-ribosomal peptide biosynthesis 
has been hampered by the inability to fully modify and thus convert to the active form some 

30 polyketide synthases (PKS) and polypeptide synthetases (NRPS) when overproduced in 
heterologous hosts, presumably because the host PPTases are unable to effectively modify 
these overexpressed protein substrates. Our group is currently involved in the 
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characterization of the gene cluster responsible for the biosynthesis of the antitumor drug 
bleomycin in Streptomyces verticillus ATCC 15003. As bleomycin synthetase is a hybrid 
NRPS/PKS enzyme, we decided to obtain a PPTase from the producing organism in order to 
use it in vitro or in vivo by coexpression with the synthetase genes to produce properly 
5 modified, active synthetases for our studies. 

Results and Discussion 

Cloning of the vttA gene from S. verticillus ATCC 15003. 

The similarities among PPTases from different organisms are reduced to two 
short motifs separated by 40-45 residues: (V/I)G(V/I)D, and (F/W)(S/C/T)XKE(A/S)hhK 

10 (Lambalot et al. Chem. Biol. (1996) 3:923-936; Walsh etal. Curr. Opin. Chem. Biol. 
(1997) 1 :309-3 15). Our previous attempts to amplify PPTase sequences from S. verticillus 
chromosomal DNA using degenerate primers according to the two conserved motifs were 
unsuccessful (unpublished results), so we decided to narrow our target. PPTases have been 
classified in two groups, according to their specificity for the carrier-protein substrate: 

15 PPTases involved in polyketide/fatty acid biosynthesis use acyl carrier proteins (ACPs) as 
substrate, while those for non-ribosomal peptide biosynthesis use peptidyl carrier proteins 
(PCPs) or aryl carrier proteins (ArCPs) (Walsh et al. Curr. Opin. Chem. Biol. (1997) 
1:309-315). Several "NRPS-type" PPTase sequences were used to screen the databases to 
look for actinomycete homologues, and four proteins of unknown function were found: 

20 NshC from Streptomyces actuosus (Li et al. Gene ( 1 990) 91:9-1 7), SC5A7. 23 from S. 
coelicolor (GenBank AL03 1 107), an unnamed protein from Streptomyces sp. strain TH1 
(Mori etal. J. Bacteriol. (1997) 179:5677-5683), and Rv2794c (later renamed PptT 
(Quadrietal. Chem. Biol. (1998) 5:631-645)) from Mycobacterium tuberculosis (GenBank 
AL008967). The alignment of the actinomycete sequences showed the two motifs conserved 

25 in all PPTases and an additional motif - the "THC" motif: PXWPXGX 2 GS(M/L)THCXGY 
(SEQ ID NO:86), located about 15 amino acids upstream of the (V/I)G(V/I)D motif (SEQ ID 
NO: 87). The "THC" motif is not universally conserved in all PPTases, but it can be detected 
also in some non-actinomycete PPTases like EntD (Coderre et al. J. Gen. Microbiol. 
(1989) 135:3043-3055). Using a recently developed method of PCR primer design (the 

30 CODEHOP strategy (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) (Rose et al. 
Nucleic Acids Res. (1998) 26: 1628-1635), two primers were designed around the typical C- 
terminal PPTase motif (primers KEA-1 : 5'-T GCA GCA GAA CAG GAG GCK NYC CCA 
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NKG-3' (SEQ ID NO:88) and KEA-2: 5'-TG GGT CAG CGG GTA CCA NRC YTT RWA- 
3' (SEQ ID NO: 89, H=C+A, N=A+C+T+G, Y=C+T, K=G+T, R=A+G, W=T+A)), and one 
primer was designed from the "THC" motif (primer THC: 5'-C GGC ATG GTC GGC TCC 
HTN ACN CAY TG-3', SEQ ID NO:90, H-C+A, N=A+C+T+G, Y=C+T, K-G+T, 
5 R=A+G, W=T+A); this motif is not universally conserved in PPTases of all organisms). 
Using S. verticillus chromosomal DNA as template, no amplification product was detected 
using the THC and the KEA-1 primers. The set of primers THC/KEA-2 successfully 
amplified a single band of the expected size (about 250 bp), which was gel-purified and 
cloned. Eight individual clones were sequenced, and all of them resulted to be identical 

10 (except differences due to primer utilization) and highly similar to the putative actinomycete 
PPTases. The PCR fragment was used as a probe to screen a S. verticillus genomic library 
by colony hybridization. Of the 10,000 colonies screened, 25 positive clones were 
identified, and then confirmed by Southern analysis to contain the same 4. 6-kb BamHl 
hybridizing band. The 4. 6-kb DNA fragment was subcloned, and the nucleotide sequence 

15 of a 1 ,76 1 -bp BarnRl-Sah region was determined (SEQ ID NO. 3). 

Sequence analysis of the pptA locus. 

The sequence of the 1,761 -bp BamRl-Sall fragment was analyzed for coding 
regions by using the CODONPREFERENCE and TESTCODE programs of the GCG 
package (Genetics Computer Group, Madison, Wisconsin). Two complete ORFs (pptA, 

20 or/3) and two incomplete ORFs (orfl, orf4) were identified within the sequenced region 

(Figure 13). The first ORF from left to right (designated orfl) starts out of the analyzed area 
and ends with a TGA codon at position 248 of the sequenced fragment. Comparison of the 
deduced product of orfl with proteins in databases showed similarities with Rv2795c from 
Mycobacterium tuberculosis (GenBank AL008967) and SC5A7. 22 from S. coelicolor 

25 (GenBank AL03 1 1 07), both of unknown function. The second ORF, pptA, contains the 
sequence amplified by PCR and used for the cloning of this locus. It comprises 741 
nucleotides, starting with a GTG codon (position 245) which is coupled to the stop codon of 
orfl, and ending with a TAA codon. The starting codon of pptA is preceded by a potential 
ribosomal binding site (RBS), GGGAG. The overall (76. 6%) and third codon position (93. 

30 9%) G+C contents and the codon usage of pptA are similar to those found in other 

Streptomyces genes, with the exception of the stop codon (TAA), which is most uncommon 
in this group of organisms (Wright et al. Gene (1992) 113:55-65). The pptA gene encodes a 
protein of 246 amino acids with a predicted molecular mass of 25,619 Da and a pi of 4. 76, 




which contains the conserved PPTase motifs. Databases searches with PptA showed 
significant similarities to the putative actinomycete PPTases (39-52%/48-61% 
identity/similarity) and to confirmed bacterial PPTases such as EntD from E. coli 
(17%/24% identity/similarity) (Lambalot et al. Chem. Biol (1996) 3:923-936). The third 
5 ORF, or/3, is separated from pptA by an apparently noncoding DNA region of 153 bp, and it 
is transcribed in opposite and convergent direction with respect to orfl-pptA. The gene or/3 
comprises 240 nucleotides, starting with an ATG codon (position 1358) and ending with 
TGA. The starting codon of or/3 is preceded by the sequence GAAGG, a potential RBS. 
The deduced product of or/3 encodes a protein of 79 amino acids with a predicted mass of 

10 7,555 Da and a pi of 7. 17. The Orf3 protein shows similarities to the N-terminal region of 
SC5H1 . 35c, a protein of unknown function from S. coelicolor (GenBank AL049863). 
Analysis of Orf3 with the SignalP program (Nielsen et al. Protein Engineer. (1997) 10:1-6) 
predicts an N-terminal signal peptide which would be cleaved between residues 27 and 28 
(ALA-DS), suggesting that the mature protein (52 amino acids, 5,099 Da, pi 4. 31) would be 

15 secreted. Between or/3 and or/4 there is an apparently noncoding region of 251 nucleotides. 
The orf4 gene is transcribed in opposite and divergent direction with respect to or/3. It starts 
with an ATG codon at position 1610, preceded by a potential RBS (GGAGG), and ends out 
of the sequenced fragment. The deduced protein product (50 amino acids) of the incomplete 
orf4 contains a potential NAD/FAD binding motif, GXGX 2 GX 3 GX 6 G (Scrutton et al. 

20 Nature (1990) 343:38-43), showing low similarities to diverse oxidoreductases. 

Heterologous expression and biochemical characterization of PptA. 

In order to test if pptA actually encodes a functional PPTase, we decided to 
overproduce and purify the PptA protein, and assay its catalytic competence on putative 
substrate proteins or domains. The pptA coding sequence was amplified by PCR and cloned 

25 into the T5-promoter-based pQE-70 vector, yielding plasmid pQEPPT, in such a way that a 
hexahistidine tag would be added at the C-terminus of the protein. Expression of the 
pQEPPT construct in E. coli M15(pREP4) resulted in the overproduction of soluble His- 
tagged PptA which was readily purified by affinity chromatography on Ni-NTA agarose 
under non-denaturing conditions (FIGURE). Because pptA belongs, by sequence similarity, 

30 to the subfamily of PPTases involved in nonribosomal peptide synthesis, we first assayed its 
activity using two different apo-PCPs as protein substrates. The first one, Blml, has been 
previously characterized in our laboratory as a discrete peptidyl carrier protein, or type II 
PCP, whose gene is found within the bleomycin-biosynthesis gene cluster of S. verticillus 
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(Duetal. Chem. Biol. (1999) 6:507-517). For the second PCP substrate we used BlmX, a 
bimodular NRPS protein encoded in the same cluster (Fig. 2), as a source of a type I PCP, i. 
e. a PCP included in a multidomain NRPS. For the production of this type I PCP, we 
amplified by PCR a 1,898 bp fragment encoding the adenylation and PCP domains from the 
5 second module of BlmX. This DNA fragment was cloned into pMAL-c2x to yield 

pMAL1617, in which the type I PCP would be produced as a maltose-binding protein (MBP) 
fusion, MBlmX-2, with a predicted molecular mass of 108. 5 kDa. Introduction of 
pMAL1617 in E. coli TBI resulted in good overproduction of MBlmX-2, about 40% 
soluble, which was purified by affinity chromatography using amylose resin. To test the 

10 PPTase activity, we incubated the purified PptA with Blml and MBlmX-2 as putative protein 
substrates in the presence of ( 3 H)-(pantetheinyl)-CoASH, and the tritiated products were 
subjected to SDS electrophoresis and autoradiography. The well-characterized PPTase Sfp 
from B. subtilis, which exhibits a broad specificity for its protein substrate (Quadri et al. 
Biochemistry (1998) 37:1585-1595), was included as a positive control. In these 

1 5 experiments PptA exhibited a robust phosphopantetheinylation activity on both Blml and 
MBlmX-2. Having demonstrated that PptA does in fact have PPTase activity on both type I 
and type II PCP substrates from nonribosomal peptide synthetases, we then proceeded to test 
two different acyl-carrier proteins (ACPs) as potential substrates. The first one, BlmVIII, is 
a monomodular multidomain polyketide synthase (PKS) which is encoded in the bleomycin- 

20 biosynthesis gene cluster of S. verticillus (Fig. 2). BlmVIII contains an ACP domain at its 
C-terminus, that is a type I ACP. For the second ACP substrate we used TcmM, a type II 
acyl carrier protein involved in the biosynthesis of the aromatic polyketide tetracenomycin C 
in S. glaucescens (Shen et al. J. Bacteriol. (1992) 174:3818-3821; Bao et al. Biochemistry 
(1998) 37: 8132-8138). For the production of TcmM, its coding sequence was transferred 

25 from a construct previously made in pET-22b (Gehring et al. Chem. Biol. (1997)4:17-24) 
into the pET-28a vector to yield pET28a-TcmM, in such a way that a hexahistidine tag 
should be added at both the N-terminus and the C-terminus of the protein. Plasmid pET28a- 
TcmM was introduced into E. coli BL21(DE3), and TcmM was easily purified by affinity 
chromatography using Ni-NTA resin. In vitro phosphopantetheinylation assays were 

30 performed as before, but using BlmVIII and TcmM as protein substrates, and PptA was able 
to posttranslationally modified both ACP substrates. 
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The pptA gene is not clustered to the bleomycin-biosvnthesis locus. 

Some bacterial PPTase genes have been found clustered, or close, to their 
respective "partner" NRPS genes: entD {enterobactin (Coderre et al. J. Gen. Microbiol. 
(1989) 135:3043-3055)}, {surfactin (Cosmina et al. Mol. Microbiol. (1993) 8:821- 
5 83 \)},gsp {gramicidin (Borchert et al. J. Bacteriol. (1994) 176:2458-2462)}, bli 

{bacitracin (Gaidenko et al. Biotechnologia (1992) 13-19)}, lpa-14 {iturin (Huang et al. J. 
Ferment. Bioeng. (1993) 76:445-450)}. To test the possible clustering of pptA to the 
bleomycin-biosynthesis (blm) locus, PCR reactions were performed using the THC/KEA-2 
primers on several overlapping cosmid clones spanning the blm locus plus 30-40 kb 
1 0 upstream and downstream of its putative limits. No amplification product could be obtained 
in these reactions, showing that the pptA gene is not clustered with the blm locus. 

Discussion 

It has been suggested that in organisms containing multiple 
phosphopantetheine-requiring pathways, each pathway has its own posttranslational 

15 modifying activity (Walsh et al. Curr. Opin. Chem. Biol. (1997) 1:309-315). Our group 
has found that S. verticillus ATCC15003 contains several PKS and NRPS gene clusters, one 
of them being responsible for bleomycin production (a hybrid NRPS/PKS system) (Shen et 
al. Bioorg. Chem. (1999) 27:155-171; Du et al. Chem. Biol. (1999)6:507-517). This 
suggested that the gene encoding the PPTase for the BLM NRPS could be also clustered, or 

20 close, to the NRPS genes. However, we have not found this gene after sequencing almost 
the whole blm NRPS locus. Because having this gene could be important for us in order to 
express functional NRPS modules from the blm cluster, we decided to clone the PPTase 
gene. Additionally, if the "one NRPS cluster - one PPTase" hypothesis was true, it seemed 
possible to use PPTase sequences as a new kind of probe to clone novel NRPS clusters. 

25 We know that in S. verticillus there are several NRPS locus (maybe four), so 

we expected several "PCP-type" PPTases. However we have amplified only one, and it does 
not seem to be closely linked to any of the NRPS loci. Interestingly in the actinomycete 
Mycobacterium tuberculosis, whose genome is fully sequenced, there is only one PCP-type 
PPTase gene, which is not clustered with any of the two NRPS loci present in this organism 

30 (Quadri et al, Chem. Biol. (1998) 5:631-645). These and other indirect evidences suggest 
that the idea of cluster-specific PPTases is not the general rule at all but most probably the 
exception, especially in organisms containing multiple NRPS clusters. And there are strong 
evidences that at least some PCP-type PPTases can posttranslationally modify PCPs from 



different clusters and even different organisms (Quadri et al, Chem. Biol. (1998) 5:631-645; 
Gehring et al, Biochemistry (1998) 37:11637-1 1650). It is most likely that there is only one 
PCP-type PPTase in S. verticillus and that its gene is not necessarily clustered to any of the 
NRPS loci. 

Biochemical characterization of the purified PptA protein confirmed not only 
its PPTase activity but also its broad specificity, comparable to that of Sip. Different apo- 
PCPs (type I and type II) and a type-I apo-ACP from the bleomycin synthetase, and the type- 
II apo-ACP from the tetracenomycin PKS of Streptomyces glaucescens were efficiently used 
as substrates by PptA. These results suggest PptA as a good candidate for heterologous 
coexpression with NRPS and PKS genes to overproduce active holo-synthase enzymes. 

It is understood that the examples and embodiments described herein are for 
illustrative purposes only and that various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included within the spirit and purview of 
this application and scope of the appended claims. All publications, patents, and patent 
applications cited herein are hereby incorporated by reference in their entirety for all 
purposes. 
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CLAIMS 



What is claimed is; 



5 




1 0 any one of the primer pairs identified in Table II and the nucleic acid of a bleomycin- 
producing organism as a template. 

2. The isolated nucleic acid of claim 1, wherein said nucleic acid 
comprises a nucleic acid encoding at least two open reading frames selected from the group 
consisting of Blm open reading frames 8 through 41. 

15 3. The isolated nucleic acid of claim 1, wherein said nucleic acid 

comprises a nucleic acid encoding at least three open reading frames selected from the group 
consisting of Blm open reading frames 8 through 41. 

4. The isolated nucleic acid of claim 1, wherein said nucleic acid 
comprises a nucleic acid encoding a C domain lacking one or more His residues of the 

20 conserved HHxxxDG active site for transpeptidation. 

5. The isolated nucleic acid of claim 1, wherein said nucleic acid 
comprises a nucleic acid encoding a protein encoded by a gene selected from the group 
consisting of blml, blmll, and blmXI. 



25 comprising two or more catalytic domains of a protein encoded by a nucleic acid of a 

bleomycin gene cluster wherein said catalytic domains are selected from the group consisting 
of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) 
domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, 



6. 



An isolated nucleic acid comprising a nucleic acid encoding a module 
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an oxidization domain (Ox), a ketoacyl synthase (KS) domain , an acetyl transferase (AT) 
domain, a ketoreductase (KR) domain, and a methyltransferase (MT) domain. 

7. The isolated nucleic acid of claim 6, wherein said nucleic acid 
comprises a nucleic acid encoding one or more proteins comprising a module selected from 

5 the group consisting of NRPS-0, NRPS-1, NRPS-2, NRPS-3, NRPS-4, NRPS-5, NRPS-6, 
NRPS-7, NRPS-7, NRPS-9, and PKS. 

8. The isolated nucleic acid of claim 7, wherein said nucleic acid 
comprises an open reading frame from SEQ ID NO: 1, SEQ ID NO: 2, or SEQ ID NO: 3. 

9. An isolated nucleic acid comprising a nucleic acid encoding a protein 
10 encoded by a gene from a BLM gene cluster. 

10. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
nucleic acid encoding a protein encoded by a gene selected from the group consisting of 
bind, blmll and blmXL 

1 1 . The nucleic acid of claim 9, wherein said nucleic acid comprises a 
15 nucleic acid encoding a protein encoded by a gene selected from the group consisting of 

blmlll, blmlV, blmV, blmVI, blmVII, blmlX, andblmX. 

12. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
nucleic acid encoding a protein encoded by blm VIII, 

13. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
20 nucleic acid selected from the group consisting of blml, blmll, and blmXI. 

14. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
nucleic acid selected from the group consisting of blmlll, blmlV, blmV, blmVI, blmVII, 
blmlX, and blmX. 

15. The nucleic acid of claim 9, wherein said nucleic acid comprises 

25 blmVIII. 

16. An isolated nucleic acid comprising a nucleic acid that encodes a 
protein comprising at least one catalytic domain selected from the group consisting of a 
condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) 



domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, 
an oxidization domain (Ox), a ketoacyl synthase (KS) domain , an acetyl transferase (AT) 
domain, a ketoreductase (KR) domain, and a methyltransferase (MT) domain, and that 
hybridizes to a nucleic acid selected from the group consisting of or/8, orfl, orflO, orfll, 
5 orfl 2, or/13, or/14, orfl5, or/15, or/l 6, or/1 7, orfJ8, or/19, orf20, orfll, or/22, or/23, orf24, 
orf25, orf26, orf27, orf28, orf29, or/30, or/31, or/32, or/33, or/34, or/35, or/36, or/37, or/38, 
or/39, or/40, and or/41 under stringent conditions. 

17. The nucleic acid of claim 16, wherein said isolated nucleic acid 
comprises a nucleic acid encoding a module. 

10 18. The nucleic acid of claim 16, wherein said isolated nucleic acid 

comprises a nucleic acid encoding a BLM gene. 

19. An isolated nucleic acid comprising a nucleic acid selected from the 
group consisting of consisting of or/8, or/9, or/10, orfll, or/12, or/13, or/14, orfl 5, or/15, 
or/16, or/17, or/18, or/19, or/20, orfll, or/22, or/23, or/24, or/25, or/26, or/27, or/28, or/29, 

15 or/30, or/31, or/32, or/33, or/34, or/35, or/36, or/37, or/38, or/39, or/40, and or/41, or an 
allelic variant thereof. 

20. The nucleic acid of claim 19, wherein said nucleic acid comprises a 
nucleic acid that is a single nucleotide polymorphism (SNP) of a nucleic acid selected from 
the group consisting of consisting of or/8, or/9, or/10, or/11, or/12, or/13, or/14, or/15, 

20 or/15, or/16, orfl 7, orfl 8, orfl9, or/20, orfll, or/22, or/23, or/24, or/25, or/26, or/27, or/28, 
or/29, or/30, orfll, or/32, or/33, or/34, or/35, or/36, or/37, or/38, or/39, or/40, and or/41. 

21 . An isolated gene cluster comprising open reading frames encoding 
polypeptides sufficient to direct the assembly of a bleomycin. 

22. An isolated multi-functional protein complex comprising both a 
25 polyketide synthase (PKS) and a peptide synthetase (NRPS). 

23 . An isolated nucleic acid encoding a multi-functional protein complex 
comprising both a polyketide synthase (PKS) and a peptide synthetase (NRPS). 
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24. An isolated polypeptide comprising a catalytic domain encoded by a 
nucleic acid of a bleomycin gene cluster wherein said nucleic acid comprises a nucleic acid 
selected from the group consisting of 

a nucleic acid encoding any one of Blm open reading frames (ORFs) 8 

5 through 41; and 

a nucleic acid amplified by polymerase chain reaction (PCR) using 
any one of the primer pairs identified in Table II. 

25. The polypeptide ofclaim 25, wherein said polypeptide comprises an 
enzymatic domain selected from the group consisting of a condensation (C) domain, an 

10 adenylation (A) domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization 
domain (Cy), an acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), a 
ketoacyl synthase (KS) domain , an acetyl transferase (AT) domain, a ketoreductase (KR) 
domain, and a methyltransferase (MT) domain. 

26. The polypeptide claim 25, wherein the nucleic acid of a bleomycin 
1 5 gene cluster comprises a nucleic acid encoding at least two open reading frames selected 

from the group consisting of Blm open reading frames 8 through 41. 

27. The polypeptide claim 25, wherein said nucleic acid of a bleomycin 
gene cluster comprises a nucleic acid encoding at least three open reading frames selected 
from the group consisting of Blm open reading frames 8 through 41. 

20 28. The polypeptide claim 25, wherein said polypeptide comprises a C 

domain lacking one or more His residues of the conserved HHxxxDG active site for 
transpeptidation. 

29. The polypeptide claim 25, wherein said polypeptide is a polypeptide 
encoded by a gene selected from the group consisting of blml, blmll, and blmXI. 

25 30. An isolated polypeptide comprising a module comprising two or more 

catalytic domains of a protein encoded by a nucleic acid of a bleomycin gene cluster wherein 
said catalytic domains are selected from the group consisting of a condensation (C) domain, 
an adenylation (A) domain, a peptidyl carrier protein (PCP) domain, a 
condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, an 
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oxidization domain (Ox), a ketoacyl synthase (KS) domain , an acetyl transferase (AT) 
domain, a ketoreductase (KR) domain, and a methyltransferase (MT) domain. 

3 1 . The polypeptide of claim 30, wherein said polypeptide comprises a 
module selected from the group consisting of NRPS-0, NRPS-1, NRPS-2, NRPS-3, NRPS-4, 
NRPS-5, NRPS-6, NRPS-7, NRPS-7, NRPS-9, and PKS. 

32. An isolated polypeptide encoded by a gene from a BLM gene cluster. 

33. The polypeptide of claim 32, wherein polypeptide is encoded by a 
gene selected from the group consisting of blml, blmll, and blmXI. 

34. The polypeptide of claim 32, wherein said nucleic acid comprises a 
nucleic acid encoding a protein encoded by a gene selected from the group consisting of 
blmlll, blmlV, blmV, blmVI, blmVII, blmlX, andblmX. 

35. The polypeptide of claim 32, wherein polypeptide is encoded by 

blmVIII. 

36. An isolated polypeptide comprising a module wherein said module is 
specifically bound by an antibody that specifically binds to a BLM module selected from the 
group consisting of NRPS-0, NRPS-1, NRPS-2, NRPS-3, NRPS-4, NRPS-5, NRPS-6, 
NRPS-7, NRPS-7, NRPS-9, and PKS. 

37. The polypeptide of claim 36, wherein said polypeptide is specifically 
bound by an antibody that specifically binds to a polypepide encoded by a gene selected 
from the group consisting of of blml, blmll, blmXI, blmlll, blmlV, blmV, blmVI, blmVII, 
blmlX, blmX, and blmVIII. 

38. An isolated polypeptide comprising a polypeptide encoded an open 
reading frame of a nucleic acid selected from the group consisting of SEQ ID NO.T, SEQ ID 
NO:2, and SEQ ID NO:3, or an allelic variant thereof. 

39. The polypeptide of claim 38, wherein said nucleic acid comprises a 
single nucleotide polymorphism (SNP) of an open reading of a nucleic acid selected from the 
group consisting of SEQ ID NO: 1, SEQ ID NO:2, and SEQ ID NO:3. 
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40. An expression vector comprising a nucleic acid of any one of claims 1 

through 23. 



41. A host cell transformed with an expression vector of claim 40. 

42. The host cell of claim 4 1 , wherein said cell is transformed with an 
exogenous nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct 
the assembly of a bleomycin or bleomycin analog. 

43. The cell of claim 41 , wherein said cell is a bacterial cell. 

44. The cell of claim 43, wherein said cell is a Streptomyces cell. 

45. The cell of claim 41, wherein said cell is a eukaryotic cell. 

46. A method of chemically modifying a biological molecule, said method 
comprising contacting a biological molecule that is a substrate for a polypeptide encoded by 
one or more bleomycin biosynthesis gene cluster open reading frames with the polypeptide 
encoded by one or more bleomycin biosynthesis gene cluster open reading frames, whereby 
said polypeptide chemically modifies said biological molecule. 

47. The method of claim 46, wherein said method comprising contacting 
said biological molecule with at least two different polypeptides encoded by blm gene cluster 
open reading frames. 

48 . The method of claim 46, wherein said method comprising contacting 
said biological molecule with at least three different polypeptides encoded by blm gene 
cluster open reading frames. 

49. The method of claim 46, wherein said contacting is in a host cell. 

50. The method of claim 49, wherein said host cell is a bacterium. 

5 1 . The method of claim 46, wherein said contacting ex vivo. 

52. The method of claim 46, wherein said biological molecule is an 
endogenous metabolite produced by said host cell. 
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53. The method of claim 46, wherein said biological molecule is an 
exogenous supplied metabolite. 



54. The method of claim 46, wherein said host cell is a eukaryotic cell. 

55. The method of claim 54, wherein said eukaryotic cell is selected from 
the group consisting of a mammalian cell, a yeast cell, a plant cell, a fungal cell, and an 
insect cell. 

56. The method of claim 46, wherein said biological molecule is an amino 
acid and said polypeptide is a peptide synthetase. 

57. The method of claim 46, wherein said polypeptide is a methyl 

transferase. 

58. A method of coupling a first amino acid to a second amino acid, said 
method comprising contacting the first and second amino acid with a recombinantly 
expressed bleomycin nonribosomal peptide synthetase (NRPS). 

59. The method of claim 64, wherein said NRPS is selected from the 
group consisting of NRPS-5, NRPS-4, NRPS-3, NRPS-9, NRPS-8, and NRPS-7. 

60. The method of claim 64, wherein said NRPS is selected from the 
group consisting of NRPS-6, NRPS-2, NRPS-1, and NRPS-0. 

61 . The method of claim 64, wherein said contacting is in a host cell. 

62. A method of coupling a first fatty acid to a second fatty acid, said 
method comprising contacting the first and second fatty acids with a recombinantly 
expressed bleomycin polyketide synthase (PKS). 

63. The method of claim 62, said contacting is in a host cell. 

64. A method of producing a bleomycin or bleomycin analog, said method 

comprising: 

providing a cell transformed with an exogenous nucleic acid 
comprising a bleomycin gene cluster encoding polypeptides sufficient to direct the assembly 
of said bleomycin or bleomycin analog; 



culturing the cell under conditions permitting the biosynthesis of 
bleomycin or bleomycin analog; and 

isolating said bleomycin or bleomycin analog from said cell. 

65. An isolated nucleic acid comprising a nucleic acid encoding a 
phosphopantetheinyl transferase said nucleic acid encoding a phosphopantetheinyl 
transferase being selected from the group consisting of: 

a nucleic acid encoding the protein encoded by the nucleic acid of 

SEQ IDNO:3; 

a nucleic acid amplified by polymerase chain reaction (PCR) using 
primers that specifically amplify ORF 41 (primers: SEQ ID NO:71 and SEQ ID NO:72) and 
Streptomyces nucleic acid as a template; 

a nucleic acid encoding a polypeptide having phosphopantetheinyl 
transferase activity where said nucleic acid specifically hybridizes to the nucleic acid of SEQ 
ID NO: 3 under stringent conditions. 

66. The nucleic acid of claim 65, said nucleic acid comprising a nucleic 
acid of SEQIDNO:3. 

67. A polypeptide comprising a phosphopantetheinyl transferase encoded 
by SEQ ID NO:3. 

68. A vector comprising the nucleic acid of claim 66. 

69. A cell transfected with the vector of claim 68. 

70. A method of converting an apo-carrier protein to a holo-carrier protein 
comprising reacting said apo-carrier protein with a recombinant phosphopantetheinyl 
transferase encoded by SEQ ID NO:3 and coenzyme A thereby producing a holo-carrier 
protein. 

71. A cell comprising a modified bleomycin gene cluster nucleic acid, 
said cell producing elevated amounts of bleomycin as compared to the wild type cell. 

72. The cell of claim 7 1 , wherein said cell overexpresses a resistance gene 
from the bleomycin bene cluster. 
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The cell of claim 72, wherein said resistance gene is a gene listed i 
Table III. 
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BLEOMYCIN GENE CLUSTER COMPONENTS AND THEIR USES 

ABSTRACT OF THE DISCLOSURE 

This invention provides detailed sequence analysis and characterization of the 
gene cluster responsible for the synthesis of bleomycin in Streptomyces verticillus. The 
bleomycin gene cluster provides the first hybrid polyketide synthase/nonribosomal peptide 
synthetase pathway and elucidation of the various modules and enzymatic domains 
characterizing the pathway provides convenient synthetic routes for bleomycins, bleomycin 
analogs, and various other polyketides. 
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PATENT APPLICATION DECLARATION 
(Attorney's Docket No.: 2500. 125US2) 

Each of the Applicants named below hereby declares as follows: 

1. My residence, post office address and country of citizenship given below 
are true and correct. 

2. I believe I am the original, first and joint inventor of the subject matter 
which is claimed and for which a patent is sought in the patent application entitled "BLEOMYCIN 

GENE CLUSTER COMPONENTS AND THEIR USES," Serial No. , filed 

January 5, 2000, and I have reviewed and understand the contents of the specification, including 
its claims. 

3 . I acknowledge my duty to disclose to the Office all information known to 
me to be material to patentability of this application, in accordance with 37 C.F.R. Section 1.56, 
which is defined on the attached page. 

I further declare that all statements made herein of my own knowledge are true and 
that all statements made on information and belief are believed to be true; and further that these 
statements were made with the knowledge that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United States 
Code, and that such willful false statements may jeopardize the validity of the application or any 
patent issuing thereon. 
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Section 1.56 Duty to Disclose Information Material to Patentability. 

(a) A patent by its very nature is affected with a public interest. The public interest is 
best served, and the most effective patent examination occurs when, at the time an application is being 
examined, the Office is aware of and evaluates the teachings of all information material to patentability. Each 
individual associated with the filing and prosecution of a patent application has a duty of candor and good 
faith in dealing with the Office, which includes a duty to disclose to the Office all information known to that 
individual to be material to patentability as defined in this section. The duty to disclose information exists 
with respect to each pending claim until the claim is cancelled or withdrawn from consideration, or the 
application becomes abandoned. Information material to the patentability of a claim that is cancelled or 
withdrawn from consideration need not be submitted if the information is not material to the patentability of 
any claim remaining under consideration in the application. There is no duty to submit information which 
is not material to the patentability of any existing claim. The duty to disclose all information known to be 
material to patentability is deemed to be satisfied if all information known to be material to patentability of 
any claim issued in a patent was cited by the Office or submitted to the Office in the manner prescribed by 
§§ 1.97(b)-(d) and 1.98. However, no patent will be granted on an application in connection with which 
fraud on the Office was practiced or attempted or the duty of disclosure was violated through bad faith or 
intentional misconduct. The Office encourages applicants to carefully examine: 

(1) prior art cited in search reports of a foreign patent office in a counterpart 
application, and 

(2) the closest information over which individuals associated with the filing or 
prosecution of a patent application believe any pending claim patentably defines, to make sure that 
any material information contained therein is disclosed to the Office. 

(b) Under this section, information is material to patentability when it is not cumulative 
to information already of record or being made of record in the application, and 

(1) It establishes, by itself or in combination with other information, a prima facie case 
of unpatentability of a claim; or 

(2) It refutes, or is inconsistent with, a position the applicant takes in: 

(i) Opposing an argument of unpatentability relied on by the Office, or 

(ii) Asserting an argument of patentability. 

A prima facie case of unpatentability is established when the information compels a conclusion that a claim 
is unpatentable under the preponderance of evidence, burden-of-proof standard, giving each term in the claim 
its broadest reasonable construction consistent with the specification, and before any consideration is given 
to evidence which may be submitted in an attempt to establish a contrary conclusion of patentability. 

(c) Individuals associated with the filing or prosecution of a patent application within 
the meaning of this section are: 

(1) Each inventor named in the application; 

(2) Each attorney or agent who prepares or prosecutes the application; and 

(3) Every other person who is substantively involved in the preparation or prosecution 
of the application and who is associated with the inventor, with the assignee or with anyone to whom 
there is an obligation to assign the application. 

(d) Individuals other than the attorney, agent or inventor may comply with this section 
by disclosing information to the attorney, agent, or inventor. 
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SEQUENCE LISTING 



SEQ ID NO: 1 BLM gene cluster ORFS 3 0 through 8 

(note orf 31-40 on sequence 1-18660 are translated on the reverse strand and on 
separate file) 

18601 ACCCATCTCATAGGTGTACGCGCTGGAGCATTCGGGGCACGACGGAAGGTTCTCGGTCAC 186 60 

18 661 GAGAGCACTGTAAGCCCGAACCCGCAAGGATGACGAATTGCAAAATTGTGCAAGTCGCTA 18720 

18721 CATGATGGTCCGGCTGTGCCCGCAGGTAGCCGCGGGCACAGCACCAGACGCTGCCTCCGC 18780 

18781 GCACCGCGCGGGAGGCCCGGTGAGGCGAGAGGCTGAGGTTCCGTGCCGGTTCCGCTGTAT 18 84 0 

M P V P L Y (orf30) 

18 841 CAGGCGAAGGCCGAGTTCTTCCGGATGCTGGGGCACCCGGTCCGCATCCGCGTACTGGAG 18900 
QAKAEFFRMLGHPVRI RVLE 

18 901 CTGCTGCAGGACGGGCCGATGCCGGTGCGTGATCTGCTGGCGGCGATCGAGATCGAGCCC 18 96 0 

LLQDGPMPVRDLLAAIEIEP 

18961 TCGGCGCTGTCCCAGCAGCTGGCGGTGTTGCGCCGCTCGGGCATCGTGACCTCCACCCGC 1902 0 
SALSQQLAVLRRSGIVTSTR 

19 021 ACGGGTTCCACGGTCGTCTACGAGCTGGCCGGTGGCGACGTGGCGGAGCTGATGTCCGCC 19 08 0 

TGSTVVYELAGGDVAELMSA 

190 81 GCGCGCCGCATCCTGACCGAGATGCTCAATGGGCAGCACGAGCTGCTGGAGGAGCTGAGG 1914 0 
ARRILTEMLNGQHELLEELR 

19141 GAAGCCGAGGTCAGTGCCCGGTGAGCTCCCTCGCCGTCCGGGTGGGAGCCCGGGTGCGTT 192 00 
EAEVSAR* 

MS S LAVRVGARVRS (orf29) 

19201 CCGTGCTGCCCACCCGCGCCGACCTCGCGGGCATGGGCCGCAGCCCGCGACGTGATCTAC 19260 
VLPTRADLAGMGRS PRRDLL 

192 61 TGGCCGGTCTGACCGTGGCGATCGTGGCCCTGCCGCTCGCCCTCGGATTCGGCGTCTCCT 1932 0 

AGLTVAIVALPLALGFGVSS 

193 21 CCGGTCTCGGCGCGGAGGCAGGGCTGGCCACCGCGGTGGTGGCGGGCGCGCTGGCCGCGG 193 8 0 

GLGAEAGLATAVVAGALAAV 

19381 TATTCGGTGGGTCGAATCTCCAGGTGTCCGGGCCCACGGGCGCCATGACCGTGGTCCTGG 1944 0 
FGGSNLQVSGPTGAMTVVLV 

19441 TGCCCATCGTCGCCCGGTACGGCCCCGGCGGTGTCCTCACGGTCGGCCTGCTCGCCGGAC 19500 
PIVARYGPGGVLTVGLLAGL 

19501 TGATGCTGATCGCGCTCGCCCTCGCCCGCGCCGGCCGCTACATGCAGTACGTGCCGGCCC 1956 0 
ML IALALARAGRYMQYVPAP 

19561 CGGTGGTGGAGGGCTTCACCCTCGGCATCGCCTGCGTGATCGGCTTGCAGCAGGTGCCGA 1962 0 
VVEGFTLGIACVIGLQQVPN 

19621 ACGCCCTGGGAGTCGCCAAGCCGGAGGGCGACAAGGTCCTCGTCGTGACCTGGCGCGCGG 1968 0 
ALGVAKPEGDKVLVVTWRAV 

19681 TCGAGACCTTCGCCGGGGCGCCCAACTGGACCGCTGCCGGACTGGCGGCAGCGGTCGCCG 1974 0 
ETFAGAPNWTAAGLAAAVAA 

19741 CGGTCATGCTGACCGGCGCGCGGTGGCGGCCGGTCGTTCCCTTCTCCCTCCTCGCGGTGA 1980 0 
VMLTGARWRPVVPFSLLAVT 

198 01 CCGGTGCCACCGTCGTGGCCCAGCTGTGCCACCTGGACGCGGCCCGCCCGATCGGGGACC 19 86 0 
GATVVAQLCHLDAARPI GDL 

19861 TGCCCGCGGGGCTGCCCGCCCCGTCGCTGGCCTTCCTGGACCTCGGAGCACTGGGCTCGC 1992 0 
PAGLPAPSLAFLDLGALGSL 

19921 TGCTGGCGCCTGCCGTGGCCGTGGCGGCCCTTGCCGCGTTGGAATCGCTGCTGTCGGCGT 1998 0 
LAPAVAVAALAALE SLLSAS 
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CCGTCGCGGACGGCATGACGGTCGGGCAGAAGCACGACCCGGACAAGGAGCTGTTCGGGC 
VADGMTVGQKHDPDKELFGQ 

AGGGTCTCGCCAACCTGGCCGCCCCGCTGTTCGGCGGCGTCCCGGCCACCGGCGCGATAG 
GLANLAAPLFGGVPATGAIA 

CCCGCACCGCCGTCAACGTCCGTACCGGTGCGAGCTCGCGACTGGCGGCCCTCACGCACG 
RTAVNVRTGAS SRLAALTHA 

CCGCGATCCTCGCCGTCATCGTCTTCGCCGCCGCCCCACTGGTCTCCCGCATCCCCCTGG 
AI LAVI VFAAAPLVSRI PLA 

CCGCGCTCGCCGGCGTGCTGATCGCGACCGCGATCCGCATGGTCGAAGTGGGCAGCCTGC 
ALAGVLIATAIRMVEVGSLR 



2 0341 CCGTGGCCCTGGACCTCGTCTACGCCGTCATCATCGGCCTGCTGGTCGCCGGCGCACTCG 
VALDLVYAVI IGLLVAGALA 

204 01 CCCTGCGGGCCGTGGCCAAGCAGGTCCGCCTGGACCAGGTCTCCTTGAAGGAGGACCTGA 
LRAVAKQVRLDQVSLKEDLT 

2 0461 CCGGCGACCACAGCGCCGAGGAACACGCGCTGCTCGCCGAGCACATCGTGGCGTACCGCA 
GDHSAEEHALLAEHIVAYRI 

2 0521 TCGACGGTCCGCTGTTCTTCGCCGCGGCCCACCGCTTCCTGCTGGAACTCTCGGACGTCG 
DGPLFFAAAHRFLLELSDVA 

20581 CGGACGTGCGCGTGGTGATCCTGCGCATGTCCCGCGTGACCACCATGGACGCCACCGGCG 
DVRVVILRMSRVTTMDATGA 

20641 CCCTCGTCCTGAAGGACGCGGTCACCAAGCTGAACCGGCGCGGCATCACCGTCCTGGCCT 
LVLKDAVTKLNRRG I TVLAS 

2 07 01 CCGGGGTACGCCCCGGCCAGCGCCGGGTCCTCGACTCCGTCGGCGCCCTCGGTCTGCTCC 
GVRPGQRRVLDSVGALGLLR 

20761 GGGCCGCCACCGGCGACGACTACACCGGCACTCCCGAAGCCATCGCCGCCGCCCGAAGCC 
AATGDDYTGTPEAIAAARSH 

20 821 ACCTGCACGGCGCCGGTGTCCTGGCCCCCGCCTGCCCGGGCCCGCCTCCTCCGGTACCCC 
LHGAGVLAPACPGPPPPVPP 

20 881 CACCGTGCGCTCCGAGTGCCCGACGATGAGGAGCCGACCGAGGTCCTCCTCCGTCACCCG 
PCAPSARR* 



GACACCCACGGTTGCGCCGCCCCATGCCGGCGGTCCCTCCTGACGGCCCGTCCGCGGCTT 2100 0 

GAGGCGGCGGTGGACGGCCTGCCGCCGCCGGCCTCGGGCTGATCGGCGTGATCACCGCCC 210 60 

ATGCGCGGGTGGGCGCCCGCGGCATCGTGGGCGGGACCGTGTTCCCGGCCACCGCGGCGG 2112 0 

CCGGCCTCGCGCTGGGCGTGGCCTGCCGCGGTGCCTGGTAGCGGCGGGGTCCGGCGGCCG 21180 

GGCCTGTGCTTCTTCCCGCCCGTCCGGCGGGTGGCGCCGCGCCGGCGGTGACAGGGAAAT 2124 0 

ATGACCGGAACTGGGATGCTCGCGTCCACTCGGGTGTGTTTAAGTGCCACGGGGGCTTCC 213 0 0 

GACGGCGCGTCGCGCGCCGGCGGTTCGCCCGATGATGGTCGTGCGGCGCTGTGAGCCGGG 213 60 

GAGCCTATGGCACAGGACCTGAACGACTGGATCGAGGACGAGGTCGTCCCTTACGAGGAG 214 2 0 

MAQDLNDWIEDEVVPYEE (orf28) 

21421 AAGCCTCTCGAATGGATCTCCCAGTACCACTTCTTCCGCGACCCGGCGCGAGCCGCCTAT 214 8 0 
KPLEWISQYHFFRDPARAAY 

214 81 GTCGATCACACCTACTTCTTCTCACCGGCCGATGGCGCGATCGTCTACCAGAAAGTAGTG 2154 0 
VDHTYFFS PADGAIVYQKVV 

21541 GATCCCCAGGAGTCGATCATCGACATCAAGGGGAAGCCGTACTCGCTGGCCGCCGCGCTC 21600 
DPQES I IDIKGKPYSLAAAL 



20941 
21001 
21061 
21121 
21181 
21241 
21301 
21361 
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21601 CGTGACGAATCGTTCGGTCACCGGTGCCTGGTGATCGGCATCTTCATGACCTTCTTCGAC 21660 
RDESFGHRCLVIGI FMTFFD 

21661 GTGCACATCAACCGGATGCCTTACGGCGGCCGTCTCTCCTTCGCGCTCAAGGAGCCCATC 21720 
VHINRMPYGGRLSFALKEPI 

21721 GGGACGTTCAACCTCCCCATGCTGGCCATGGAGCAGGACCTGCTCGAACGGCTCCGGGTC 21780 
GTFNLPMLAMEQDLLERLRV 

21781 AATCCGGCTCACGCGAGGTATCTGCACCTGAACGAGCGGATGGTCAACCGGGTCGACGCG 2184 0 
NPAHARYLHLNERMVNRVDA 

21841 CCGCGGCTCCGGGGCCCGTACTGGATGCTCCAGATCGCCGACTACGACGTCGACTCCATC 2190 0 
PRLRGPYWMLQ IADYDVDS I 

21901 ACCCCGTTCTGCAGACGGCAGGGAATGTTCCGCTCCCAGGGGCGCCGCTTCTCCCAGATC 2196 0 
TPFCRRQGMFRSQGRRFSQI 

21961 CGCTACGGATCGCAGGTCGACCTGGTGATCCCGATGGCGGCCGACCGCGAGTACGTCCCC 22 02 0 
RYGSQVDLVIPMAADREYVP 

22 021 GTGGAGGCCGTCGGCCGGCACGTGAAGGCGGGGCTCGACCCGCTCGTCAAGATCCGGTGG 22 08 0 
VEAVGRHVKAGLDPLVKIRW 

22 081 CGTTGAAGAGCGCGTACGAAGCGATGGCGAACTGGAGGGACACAGCGTGGGTTTCCGTCG 2214 0 

R * M G F R R (orf27) 

22141 AGCGCAGAGGGCCGGTGGGCCGGGAGCGGGCCGGCGGGAGAGCGCCCGGTTCAGGCCGGA 22 20 0 
AQRAGGPGAGRRESARFRPD 

22201 CGGGCCGTCGGCGCCGCGGGACCGTCCGTTACCCCTGTCCGCCGGGCAGTTGTTCGAGTG 22 26 0 
GPSAPRDRPLPLSAGQLFEW 

22261 GGTGTTTGACAAGCTCGTCGACGGAGATCTGAGCCACCAGCCGACGATTGTGCGGCTCCG 22 32 0 
VFDKLVDGDLSHQPTIVRLR 

22321 CGGCCCGCTGAACACCGCCGCCCTGCGGATGGCCTACGCCCGGCTGGTGCGGCGCCACGA 223 8 0 
GPLNTAALRMAYARLVRRHE 

22 381 GTGCCTGCGCACCCGCTTCCCCGTGATCGACGGGGAGCCCGTGCAGGTGATCGAGGGCAT 2244 0 
CLRTRFPVIDGEPVQVIEGI 

22441 CGGGAAAGCAGCGGGGGGCCCGCTGCCGCTCATCGATCTGCGCCACCTCCCGGAGGCGCT 22500 
GKAAGGPLPLIDLRHLPEAL 

22 501 TCGCGCGCGCGAGATCGCGAGGATCCGCGAGGAGACGCTGTCCACGCCGGTCCCCTTCGA 22 56 0 
RAREIARIREETLSTPVPFD 

22 561 CAAGCGGCCGCCCGTCCGCGTGGCGCTGATCCGGGCGGCGCCCGAGGAGCACCTCTTCCT 22 620 
KRPPVRVALIRAAPEEHLFL 

22 621 CGTCGGCATCCCGCACATCACCGCGGACCTGTGGTCCGCGACCCTGCTCAACGACGAGCT 22 68 0 
VGI PHITADLWSATLLNDEL 

22 681 CATGGCGCACTACAGGGCGGGGGCCGAGGGGACTCCCTCCCGGGCCCCCACCCCCGTCGC 2274 0 
MAHYRAGAEGTPSRAPTPVA 

22 741 GCAGTACGCCGACTTCGCGCAGTGGCAGCGCGCGTGGTGGAACCGGGACCGCACCGAGCG 22 800 
QYADFAQWQRAWWNRDRTER 

22 8 01 GGAGGCCGGACGGTGGCGGGCGCGGCTGGACGGGCTGTCCGCCGTGGAACTGCCCCTGGA 22 86 0 
EAGRWRARLDGLSAVELPLD 

22 861 CCGGCCCCGCCCCGCGGGCCGCCGGCGGGACTGCTTCCTGATCGGGGACACCTTCGACGC 22 92 0 
RPRPAGRRRDCFLIGDTFDA 

22 921 CGAACTGAGCGACCGGCTGCGCGCCTTGGCACGCACCGCCGACGTCACGCTGTACGTGGT 22 98 0 
ELSDRLRALARTADVTLYVV 

22 981 GCTGCTGGCGGCGTTCCACTGGCTGGTGGGGCGGATGTCGGGCGCCGGCCGGCTGGTGAC 23 040 

LLAAFHWLVGRMSGAGRLVT 

23 04 1 CACCTCGCTCGTGGCCGCCCGGCACGGCAGCGCGGTACAGGGGATGACCGGCCCGTTCTC 2310 0 
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TSLVAARHGSAVQGMTGPFS 

23101 GGACTACCTGGCCCTGGTCGGGGACCTGTCGGGCGATCCGGACTTCCTGGAGTCCCTGCG 23160 
DYLALVGDLSGDPDFLESLR 

23161 CCGCGTACGCGACGAGTGCCTGACCGCCCACGACCACCAGCGGCTTCCGTTCTCACAGGT 2322 0 
RVRDECLTAHDHQRLPFSQV 

23 221 CCTCGAAGTCATGGACCCCGGACGCGAGTTGCACCCCCATCCGCTGGAGCAGCTCGGGTT 2 32 80 
LEVMDPGRELHPHPLEQLGF 

23281 CAACCTCCACAACATCCCTCCCGCGGTCATGGACTTCTCCGGCGACGTCGTCGTCTCGGC 23340 
NLHNI PPAVMDFSGDVVVSA 

23 341 GGTGAACCCGGAGGGGGACGACGGGGAGAGCGGCGACGGGGAGTACGTGCCCTGGACCGC 234 0 0 
VNPEGDDGESGDGEYVPWTA 

234 01 CGACCTGACCTTCGACGTCTACGACTACGGCACCGGCCATATGCCGTTCGACGTGATACT 2 34 60 
DLTFDVYDYGTGHMPFDVIL 

23461 CGACCGGCGGCTGGCCGATCCGGCGACGGCCCGGGAGTGGGCCGGGCACTACCGGTCGGT 23 52 0 
DRRLADPATAREWAGHYRSV 

23521 GCTCCGTGCGGTCGTCGCCGACCCCGGCGTGCGCCTGTCCGCCCTCGGCACCCTGCTGTC 23 58 0 
LRAVVADPGVRLSALGTLLS 

23581 CCTGCCGCGACCGCCGTCCGCCACGTCCTTCGGCGGCCGGGAGATCGACGTCCGGCGCGT 23 64 0 
LPRPPSATSFGGREIDVRRV 

23 641 CGAACGCGAGTTGGCGGGGCGCGACGGGATCACCGCCGCCCTGGTCGCGGTGGCGCCCCG 23700 

ERELAGRDGITAALVAVAPR 

23701 GCGCCTGGCCACCGGGCTGCGCGTACGGGAACTGGTCGCCTACTGCGCCGTCGAGGGCAC 23 76 0 
RLATGLRVRELVAYCAVEGT 

23761 GCCGCGTCCGAACGCGGCCCACGACATCCGCGGCCGCCTGCGGGAGCGCCTGCCCGACGG 23 820 
PRPNAAHDIRGRLRERLPDG 

23821 CTGGGTGCCGACCGTGTTCGTCGAGCGCCCGCCGGAGGAGATCCGGAAGGCCCTGGCCGC 2388 0 
WVPTVFVERPPEEIRKALAA 

238 81 CCGGGCGGCGGGCGGCGAACGGGCGGAGCCGCTGCCGCCGCCCGAGGACTGCGTCCCGCT 23 940 
RAAGGERAEPLPPPEDCVPL 

23941 TCCCGAGGAGGGCCGGCCCCCCTCGGACCCGTCCGAGCGGCGGCTGGCCGCGCTCTGGGC 24000 
PEEGRPPSDPSERRLAALWA 

24001 CGAGATCCTGGGCGCCCCGCCGAAGAGCGTGACCGAGCCCTTCTTCCGCGTCGGCGTCAC 24060 
E I LGAP PKSVTE PFFRVGVT 

24 0 61 CGATAAGGACGCCCTCCGCTTCCTGGCCCGCGTGGCGGAGGACTTCGGCGTCACCGTGCC 2412 0 

DKDALRFLARVAEDFGVTVP 

24121 CTTCGCCGACTTCCTCAGCGCTCCCAACCTGCGTATGGTGAAGGACAATTTGGCTGAGAA 24180 
FADFLSAPNLRMVKDNLAEK 

24181 ACGGAGGGTGTAACGCGCAATGAGTGAGTGGTAGGGTCGGAATCGAACCGCACTGATCGG 2424 0 
R R V * 

24241 CAATCTTTTCGGTCAGCTGTTCCGGATATTCCGGGGCGCGTCGGCGCTCCCTCGACCAAG 24300 

243 01 GGCGTACGCGGATAAGCGTGCGCCGCCCCACGGCTGCGTCTCGACGCCTTCATCGGCGCG 24 3 6 0 

24361 TCGGACACTTCGCGGTGCCAGTCGGCACGCTCAGAGATCAGTGGAATGCCTCGGTGTGCC 2442 0 

M P R C A (orf26) 

24421 CGAGGTGCGCTCAGTACTGCTGTCCACACAACGCGCCAAGGGAGTTGGAACGTGATGGAG 2448 0 
RGALSTAVHTTRQGSWNVME 

244 81 ACGGCGAATTCCGGCTATCGGGTCTCACCTCAGCAGCGGCATTTATGGGCCATGCTGACC 24 54 0 

TANSGYRVS PQQRHLWAMLT 

24541 CGCGGGCGGGACGGCGGGCGACGTGCGTTCACCCAGTCCGCCGTGGTGGTCGACCGTTCC 24 60 0 
RGRDGGRRAFTQSAVVVDRS 
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24601 CTGGACGCCGCACGTCTGCGCGCCGCGCTGGCCTCCGTGGTGGCCGCCCACGAGCCGCTG 24660 
LDAARLRAALASVVAAHEPL 

24661 CGGACGACCTTCACCGGTCTCGCGGGACGGACCGCGCCGGTCCAGGTCGTCCATGACCCG 24 720 
RTTFTGLAGRTAPVQVVHD P 

24721 GACGAGCAGCCGCTGTCCGTCGTCGACCTGCCGCCCTCGTGCGCCGACGGCTCGGGCCCG 24780 
DEQPLSVVDLPPSCADGSGP 

24781 GAACTGGACGAGCTCCGGCTCCGCGAACGCGC CGCCCTCGACCCGCGCGGCGGGC CCGT C 24840 
ELDELRLRERAALDPRGGPV 

24 841 TTCCGGGCCGCCCTGGCGCGGGCCGGCGAGGACCGGGCGGTGCTGGTGCTCACCGCGCAC 24 900 
FRAALARAGEDRAVLVLTAH 

24 901 GCCCTGGTCGCGGACCGGCTCTCCCTCCGGCTGCTGGCCGGGCAGATCCTCGCGGCGTAC 24 960 
ALVADRLSLRLLAGQILAAY 

24 9 61 AGCGGGGAGACCGTGTCCCCCGATGGCCCGCCGCCCTTGCAGTACGCCGACTTCGCCGCC 2 5 02 0 
SGETVSPDGPPPLQYADFAA 

2 5021 TGGCACCACGACCTGCTCACCGCCGAGGACGCCGCCCCCGACCGCGCGCACTGGGCCGCC 25080 
WHHDLLTAEDAAPDRAHWAA 

2 5081 CACACCGCCACCGCCGGCACCGGGCCGCTCCCCGGCGTCGTACGGCCCGGCGCCGCCCCG 25140 
HTATAGTGPL PGVVRPGAAP 

2 5141 GGTCCGTGGCGGGCGCGGGAGTGGGAACTGCCCGCCGAACTGGTGGCGGGGATCGACGGC 2 52 0 0 
GPWRAREWELPAELVAGIDG 

2 5201 GTCGCCGGGAAGCTGTCCACCGATCCCGCCACCGTGCTGCACGCCGCCTTCCGTATCGCG 25260 
VAGKLSTDPATVLHAAFRIA 

2 52 61 GTCTGGCGGCTCGCCGGCGAGCGGAACCTGCCCGTCGCCCTCACTCGTGACGGCCGTTCC 2 5320 
VWRLAGERNLPVALTRDGRS 

25321 CACCCCGAACTCCGCACCGCGATCGGCGCCTTCGAGCGTGAGCTCCCGCTCGTCCACGAG 253 80 
HPELRTAIGAFERELPLVHE 

2 53 81 ATCCGTCACGAGACGGCGTTCGCGGAATACGCGCGCGCTCTGGACGCGCTCGTCGCCGAG 2 5440 
I RHETAFAEYARALDALVAE 

2 5441 GGCGAGGAACTCCTCGACCATTGCGACCCGGAACTGCTCGGCAGCCTCGACGGCACCGCG 2 55 00 
GEELLDHCDPELLGSLDGTA 

2 5501 GAAGGGCCCTGCTTCACCTTCACCCACCACCAGGCCGAAACACCGGTCCGGCGGGCCGGC 25560 
EGPCFTFTHHQAETPVRRAG 

2 5561 ATCACCTTTACCACCGTCCATCAGGATTCGGGTACGCCGATTCCCGTCCGCCTGACCGCC 2 5620 
I TFTTVHQDSGTP I PVRLTA 

25621 CGACGCGACGGCGCCCGGCTGCGCATGGAACTGGGATACGACGAGGGCCGTATCGACGAG 2568 0 
RRDGARLRMELGYDEGRIDE 

2 5681 ACGTTTCCCGAGAACGCCGCCGCCTGCCTCACCCGCATTCTCGAAGGCGTCGTCTCCGCC 2 574 0 
TFPENAAACLTRI LEGVVSA 

2 5741 CCCGAGGGCCCGGTCGGCGACATCCGCATGCTGTCGGACGAGACCGCACGGCTGCTCCGG 258 0 0 
PEGPVGDIRMLSDETARLLR 

2 5801 GAAGCGGGGCTGGGCCCCCGCGTGGAACTTCCCGGCAAGGCGGTCCACGAACTCTTCGCC 2 5860 
EAGLGPRVELPGKAVHELFA 

25861 GAGCAGGCCGCGCGCACCCCCGGGGCGGTCGCGGTCAGCGCGGGCGAGGACGCCCTCACG 25920 
EQAARTPGAVAVSAGEDALT 

2 5 921 TACGCCGAACTCGACGAGCGGTCCAACCGCCTGGCACACCACCTGACCGGGCTCGGGGTG 2 5 98 0 
YAELDERSNRLAHHLTGLGV 

25981 ACACCCGGCCGGCACGTCGTGGTCTCGGTCGGCCGCTCCGCCGAGCTGCTCGTCGGGCTG 2 6040 
TPGRHVVVSVGRSAELLVGL 

26 041 CTCGGCGTGCTCAAGGCGGGTGGCGCCTTCGTCCCCGTCGACGTGGGCTTCCCCCGCAAA 2 610 0 
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LGVLKAGGAFVPVDVGFPRK 

26101 CGGCTGGAGTTCGTGCTCCGGGAGACCGCCGCGCCGGTCCTGCTCTGCACCGCCGACGTA 2 6160 
RLEFVLRETAAPVLLCTADV 

26161 CGGGACCGCATCGGCACTCGGACCCTCGACGACGCCGGGGTGACACCCGTCGCGCTGGAC 26220 
RDRIGTRTLDDAGVTPVALD 

26221 GCCGACCGGCGGCGCATCGCCGCACACCCCGCCGGCCCCACCGGCATCGCCACCACCCCC 26280 
ADRRRIAAHPAGPTGIATTP 

26281 GACGCCCCCGCGTACGTCGTCTACACCTCCGGCACCACCGGGAAGCCCAACGGCGTACGC 2 63 4 0 
DAPAYVVYTSGTTGKPNGVR 

26341 GTCCCGCACCGGGGCCTCACCAACTACCTCACCTGGTGCACCGGCGCCTACGGACTCGAC 2 6400 
VPHRGLTNYLTWCTGAYGLD 

264 01 GGGGGCACCGGCACCCTCGTGCACACCTCCATCAGCTTCGACCTCACCCTCACCACCCTG 2 6460 
GGTGTLVHTSISFDLTLTTL 

26461 TTCGGCCCCCTGCTCGCCGGCGGGCAGGTGGTCATGCTCTCCGAGACCGCCGGCGTGACC 26520 
FGPLLAGGQVVMLSETAGVT 

26521 GGCCTGATCGCCGCGCTGCGCTCCCGGCGCGACCTCACCCTGGTCAAGCTGACCCCGACC 26580 
GLIAALRSRRDLTLVKLTPT 

26581 CACCTCGACGTCGTCAACCAGCTGCTCACCCCCGACGAGCTGCGCGGCGCGGTCCGCACC 2 6640 
HLDVVNQLLTPDELRGAVRT 

26641 CTCGTCGTCGGCGGGGAGGCGGTGCGGGCGGAGAGCCTGGAGCCGTTCCGGGCCTCCGGG 26700 
LVVGGEAVRAESLEPFRASG 

26701 ACGCGGGTCGTCAACGAGTACGGGCCCAGCGAGACGGTCGTCGGCAGCGTCGCGCACGTC 2 67 60 
TRVVNEYGPSETVVGSVAHV 

2 6761 GTCGACGCCGCCACGCCCCGTACCGGCCCGGTGCCCATCGGCCGGCCGATCGCCAACACC 2 682 0 
VDAATPRTGPVPIGRPIANT 

2 6821 ACCGTCCACCTGCTCGACCAGCGGCGGCGGCCCGTCCCCGACGGCGTCGTCGGCGAGCTG 2 68 80 
TVHLLDQRRRPVPDGVVGEL 

2 6 881 TGGATCGGCGGCGCCGGTGTCGCCGACGGCTACCTGGGGCGGCCGGAACTCACCGGCGAG 26940 
WIGGAGVADGYLGRPELTGE 

2 6941 CGCTTCCTCCCCAGCGACTACCCGCCGGACGGCGGCCGGGTCTACCGCACCGGCGACCTG 27000 
RFLPSDYPPDGGRVYRTGDL, 

270 01 GCCCGCCGGCGCGCCGACGGCACCCTGGAGTACCTCGGGCGCACCGACGCGCAGGTGAAG 2 7060 
ARRRADGTLEYLGRTDAQVK 

27061 ATCCGCGGCGTCCGGGTGGAGCCCGCCGAGACCGAGGCCGTCCTCGCCTCCCACCCCGGC 2 712 0 
IRGVRVEPAETEAVLASHPG 

27121 GTCGGCCAGGCCGTCGTGGTCGCCCGGCTGGACGAGGACCCCGGCCGTTCGTCGCCGCTC 27180 
VGQAVVVARLDEDPGRSS PL 

27181 GCCGGCGAGCTGACGCTGACCGGCTACGTGGTCCCGGCCCGCGGTGCCCAGGCGCCCCCG 2 7240 
AGELTLTGYVVPARGAQAPP 

27241 CACGAGGAGCTCATCGCGTACTGCCGGGAGCGGCTGCCCGAGCACTTCGTCCCGGCCGTC 27300 
HEELIAYCRERLPEHFVPAV 

27301 CTCGTCACCCTCGACGCCCTGCCCGTCACCGGCCACGGCAAGATCGACCGCGGTGCGCTG 2 73 60 
LVTLDALPVTGHGKIDRGAL 

2 7361 CCCAAGCCGCACGCCCGGGCCCGGGACGGCGCGGCGTACGTCGCGCCGCGCACCGCCACC 27420 
P KPHARARDG AAYVA P RTAT 

27421 GAGGAGATCCTCGCGGCCACCGTCGCGAAGGTGCTGGGCGTCGAGCGCGTCGGCATCGAC 2 74 80 
EE I LAATVAKVLGVERVGID 

2 7481 GACAACTACTTCGTCCTGGGCGGCGACTCCATCCGCAGCGTCATGGTCGCCAGCCGGGCC 27540 
DNYFVLGGDS IRSVMVASRA 
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27541 CAGGCCCGCGGGGTCGAGGTCACCGTGGCCGACCTGCACCGGCACCCCACCGTCCGGGCC 27600 
QARGVEVTVADLHRHPTVRA 

27 601 TGCGCCGCGCACCTGGACGCCCGCGAGGACCTGCCGCGGACGCCCGTCACCGAACCCTTC 27660 
CAAHLDAREDLPRTPVTEPF 

27661 GCGCTGATCTCCGCCGAGGACCGGGCGCTGGTGCCGGACGACGTCGAGGACGCCTTCCCG 2111V 
ALISAEDRALVPDDVEDAFP 

27721 CTGAACCTGCTCCAGGAAGGCATGATCTTCCACCGCGACTTCGCGGCGAAGTCGGCCGTC 2 778 0 
LNLLQEGM I FHRD FAAKSAV 

27 781 TACCACGCCATCGCGTCCGTGCGGCTGCGCGCCCCGTTCGACCTCGCCGTGCTGCGGATG 2 7840 
YHAIASVRLRAPFDLAVLRM 

27841 GTCGTGCGCCAGCTCGTCGAGCGGCACCCGATGCTGCGCACCTCCTTCGACATGAGCCGC 2 790 0 
VVRQLVERHPMLRTSFDMSR 

27 901 TTCAGCCGCCCGCTGCAACTGGTGCACCGCGAGTTCGCCGATCCGCTGCACTACGAGGAC 2 7960 

FSRPLQLVHREFADPLHYED 

27961 CTGCGCGGCAGGAGCGCCGAGGAGCAGGACGCC CGCGTCGAGGAGTGGATCGAGCGGGAG 2802 0 
LRGRSAEEQDARVEEWIERE 

28 021 AAGGAACGCGGCTTCGAGCTGCACGAGTTCCCGCTGATCCGCTTCATGGCGCAGCGCCTG 28 08 0 

KERGFELHEFPLIRFMAQRL 

28 081 GAGGACGACGTCTTCCAGTTCACCTACGGCTTCCACCACGAGATCGTGGACGGCTGGAGC 2 814 0 
EDDVFQFTYGFHHEIVDGWS 

28141 GAAGCCCTGATGATCACCGAGCTGTTCAGCCACTACTTCTCGGTGATCTACGACGAGCCG 2 8200 
EALMITELFSHYFSVIYDEP 

28201 ATCGCGATCAAGCCACCCACCGCCGGCATGCGCGACGCCGTCGCCCTGGAGCTGGAGGCC 2 8260 
IAIKPPTAGMRDAVALELEA 

28261 CTCGCGGACCGCCGCAACTACGAGTTCTGGGACTCCTACCTCGCCGACGCCACCCTGATG 2832 0 
LADRRNYE FWDSYLADATLM 

28 321 CGGCTGCCCAGGCCCGGCACCGGACCCCGGGCCGACAAGGGCGACCGGGACATCACCCGC 2 83 80 
RLPRPGTGPRADKGDRDITR 

2 8 381 ATCGCCGTCCCCGTCCCCACCGAACTCTCCGACGGCCTCAAGCGGGTCGCCGCCACCCAC 2 8440 
IAVPVPTELSDGLKRVAATH 

28441 GCCGTCCCGCTGAAGACCGTGCTCCTGGCCGCGCACATGGTGGTGATGTCCCTCTACGGC 2 85 00 
AVPLKTVLLAAHMVVMSLYG 

28501 GGCCACGAGGACACCCTCACCTACACCGTCACCAACGGCCGCCCCGAGACCGCCGACGGC 28560 
GHEDTLiTYTVTNGRPETADG 

2 8 561 AGCACCGCGATCGGGCTGTTCGTCAACAGCCTCGCGCTCCGCGTCCGGATGACCGGCGGC 28620 
STAIGLFVNSLALRVRMTGG 

2 8 621 ACCTGGGCCGACCTGATCACCGCCACGCTGGAGTCCGAGCGCGCCTCGATGCCGTACCGG 2 8 68 0 
TWADLITATLESERASMPYR 

2 8 681 CGGCTGCCGATGGCCGAACTCAAGCGCCACCAGGGCAACGAACCCCTGGCCGAGACGCTG 2 8 74 0 
RLPMAELKRHQGNEPLAETL 

28741 TTCTTCTTCACCAACTACCACGTCTTCCACGTGCTCGACCGCTGGATCGACCGCGGCGTC 28800 
FFFTNYHVFHVLDRWIDRGV 

28801 GGCCACGTCGCCAACGAGCTCTACGGCGAGTCCACCTTCCCCTTCTGCGGCATCTTCCGC 28860 
GHVANELYGESTFPFCGI FR 

2 88 61 CTGAACCGGGAGACCGGCGAGCTGGAGGTCCGCATCGAGTACGACAGCCTGCAGTTCTCC 2 8920 
LNRETGELEVRIEYDSLQFS 

2 8921 GACGCCCTCATGGAGAGCGTCCGCGACAGCTACGCCCGCGTCCTCGCGGCCCTGGTCGCC 2 89 80 
DALMESVRDSYARVLAALVA 

2 8 981 GACCCCGACGGGCGCTACGACCGGCACGAGTTCCGCTCCGACCGCGACCGGGCCGCACTG 29040 
DPDGRYDRHEFRSDRDRAAL 
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29041 GCCGTCCTCACCCGCGGGCCCGAGGCGCCGGCGGCCGACCGGTGCCTGCACGACCTGGTG 29100 
AVLTRGPEAPAADRCLHDLV 

29101 GCGGACCGGGCGGCGGACCGCCCCGACGCCCCGGCCGTCCAGCTGGACACCGACGTGCTC 29150 
ADRAADRPDAPAVQLDTDVL 

29161 AGCTACGGCGAGCTCGACCGCCGCGCCAACCGGCTGGCCCACCACCTGCGTTCGCTCGGC 2 922 0 
SYGELDRRANRLAHHLRSLG 

29221 ATCGGCCCGGAGAGCGTCGTCGGCGTCCTGGCCGAACGCTCCCTCGCCCAGATCATCGGC 29280 
IGPESVVGVLAERSLAQI IG 

2 9281 CTCCTCGCGGTCCTCAAGGCGGGCGCCGCCTACGTCCCGCTCGACCCGGCCCAGCCCGAC 29340 
LLAVLKAGAAYVPLDPAQPD 

2 9341 GAGCGCCTCGCCGCCGTCATCGCCGGGAGCGGGGCCGCCGCCGTCCTCCACCGGCCCGGC 2 94 00 
ERLAAVIAGSGAAAVLHRPG 

29401 CTCGAAGGGCGGCTGCCCGCGGGCGTCCGCGCGCTCCCCACCGACGCCGCCGACGGCAGC 29460 
LEGRLPAGVRALPTDAADGS 

2 94 61 ACCGCCACGCACGACCCCGGGCCCACCGCCACGCCCCGCAACGCCGCGTACGTGATGTAC 2 9520 
TATHDPGPTATPRNAAYVMY 

2 9521 ACCTCCGGATCCACCGGAGAGCCCAAGGGCATCGTCGTCGAACACCGCAACGTCGTGGCC 2 9580 
TSGSTGEPKGIVVEHRNVVA 

2 9581 TCCCTCGCCGCCCGCGGCGCCCACTACGCGGCCGGACCCGGCCGGTTCCTGCTGCTGTCC 29640 

SLAARGAHYAAGPGRFLLLS 

29641 TCCTTCGCCTTCGACAGCTCGGTCGCCGGCATCTTCTGGACGCTGACCCAGGGCGGCACC 29700 
SFAFDSSVAGIFWTLTQGGT 

29701 CTCGTCCTGCCCGGCGAGGGACAGCAACTCGACCCCGCCGCGCTGGTGGAGACCATCGCC 2 9760 
LVLPGEGQQLDPAALVETIA 

29761 CGGCAACGGCCCACCCACACCCTCGCCATCCCCTCCCTGCTGGCGCCCGTCCTGGACCAG 2 98 2 0 
RQRPTHTLAIPSLLAPVLDQ 

29821 GCCGCCCCCGGCGACCTCGCCTCCCTGCGCACGGTGATCGCCGCGGGCGAGTCCTGTCCG 29880 
AAPGDLASLRTVIAAGESCP 

29881 GCCGAACTGGCCGCCGCCTGCCGGGACCTGCTGCCCGGGAGCACCTTCCACAACGAGTAC 2 9 940 
AELAAACRDLLPGSTFHNEY 

29 941 GGCCCCACCGAGACCACCGTGTGGAGCACCGTCTGGTCCCAGGAGAACGAGCACGACGGA 30000 

GPTETTVWSTVWSQENEHDG 

30 001 CCCCACCTCCCCATCGGCCGGCCGGTCGCGGGCACCTGGGTGCACCCCCGCGACCACCGC 3 0 060 

PHLPIGRPVAGTWVHPRDHR 

30061 GGACGCACCGTCCCCCTCGGCGTCGCCGGCGAACTCTCCATCGGCGGCGCCGGCGTGGCC 3 012 0 
GRTVPLGVAGELS IGGAGVA 

3 0121 CGCGGCTACCTCGGGCGCCCCCGGGACACCGCGGCCGCCTTCCGCCCCGACCCCGAGGCC 3 0180 

RGYLGRPRDTAAAFRPDPEA 

3 0181 ACGGCTCCCGGCGGCCGCGCCTACGCCACCGGCGACCTCGGCCGCTACCTCCCCGACGGC 3 024 0 
TAPGGRAYATGDLGRYLPDG 

30241 AACCTGGAGTTCCTCGGCCGCGCCGACCACCAGGTCAAGATCCGCGGCTTCCGGGTCGAG 3 03 00 
NLEFLGRADHQVKIRGFRVE 

303 01 CTCGGCGAGATCGAGGCCGTCCTCGACACCCACCCGGAGCTCCAGCGGACCATCGTCATG 3 03 60 
LGEIEAVLDTHPELQRTIVM 

3 0 3 61 GCACGCGGCGACCACCCCGGCGACCAGGTGCTCGTCGCCTACGTCCTCCCCGCCCCCGGC 3 0420 
ARGDHPGDQVLVAYVLPAPG 

3 0421 CGGCGGCCCGAACCCGCCGACATCCAGGGGTACGTCCGCGACCGGCTGCCCCGCTACATG 3 04 8 0 
RRPEPADIQGYVRDRLPRYM 

3 0481 GTGCCCACCGCGGTGATCGTCCTCGACGCGGTACCGCTGACCGCCGCCGGCAAGGTCGAC 3 0540 
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VPTAVIVLDAVPLTAAGKVD 

30541 CGGGCCTCGCTCCCCGCCCCCAGCCACGCCCAGCTCACCCGGGACCAGGAGTACGTCGAG 30600 
RASLPAPSHAQLTRDQEYVE 

3 0601 CCCGGCACCGACACCGAGCGGGCGCTCGCCGCCATCTGGGCCGACGTCCTCAAACTGGAC 3 0660 
PGTDTERALAAIWADVLKLD 

3 0661 CGGATCGGGGCCGGTGACCGCTTCTTCGACGTCGGCGGCGAATCCCTGCGCGCGATGCAG 3 0720 
RIGAGDRFFDVGGESLRAMQ 

3 0 721 GCCACCGCCGCGGCCAACAAGATGTTCCGCACCCGCGTCTCCGTCCGCCGCCTCTTCGAG 30780 
ATAAANKMFRTRVSVRRLFE 

3 0781 GCGCCCTCCCTGCGGGAGTTCGCCCACGAGATCGACAAGGCCCGCCTCGCGGGCGGCGGG 3 0840 
APSLREFAHEIDKARLAGGG 

3 0 841 ACCGGCCTCACCGGCCCCGCGGCCGCCCCGGCCACCGGAGGTGCCGCCGAATGACCCCGG 30900 
TGLTGPAAAPATGGAAE* 

M T P A (orf25) 

3 0901 CCGCCGACACCACCCACCCGCTCTCGCCGGCCCAGCGCAGCATGTGGTTCCTGCACCGGC 3 0960 
ADTTHPLS PAQRSMWFLHRL 

3 0961 TCGCGCCCGAGGTGCCCGCCTACAACATCTGCACCGCCATCGAGCTCACCGGCACACCGC 31020 
APEVPAYNICTAIELTGTPR 

31021 GCCCGGCGGCGCTGCGGGACGTGGTACGGCGGCTCGGCCGCAGGCACGAGGCGCTGCGCA 31080 
PAALRDVVRRLGRRHEALRT 

31081 CGGTGTTCCCGTCGGTGGGGGAGACCCCCCGCCAACGGGTCACCGACCGGGCGGCGCCCC 3114 0 
VFPSVGETPRQRVTDRAAPL 

31141 TGCGGACCGTGGACCTCACCCACCTGACCCCCGCCGCCGCCGAGGCCGAGACCGCACGGA 312 00 
RTVDLTHLTPAAAEAETART 

31201 CGCTACGGTGCGCCGCCGCCCGGCCGTTCCGGCTCGACACCGGCCCCCTGGCGGAATGGA 31260 
LRCAAARP FRLDTGPLAEWT 

31261 CCCTGCTGCGCCGCGCCCCCGGCCACGCGCTGCTCGTCCTCTCCGTCCACCACATCGTCT 313 2 0 
LLRRAPGHALLVLSVHHIVF 

31321 TCGACGGCGGCTCGCTCCACGTGGTCTGCCGCGAACTGGAGGAGGCGTACGGAGCGGCCC 313 80 
DGGSLHVVCRELEEAYGAAL 

31381 TCGCCGGGCGCCCGGACCCCCTCGGCACACCCGCGCCGGGCTACGGACGGCAGTGCCGGA 31440 
AGRPDPLGTPAPGYGRQCRT 

31441 CGCGGGCGGCGGAACAGGACGAGGCCGGGCGGGAGTTCTGGCGCCGCGAACTGTCCGGCG 315 0 0 
RAAEQDEAGREFWRRELSGA 

315 01 CGCCACCCCGCACGACCGTCTTCCGGGGCACCGGCCGGCCCGCCGGACCGCCCGCCCGCG 315 60 

PPRTTVFRGTGRPAGPPARA 

31561 CCACCGTCCACTACGGCACCGACGATCCGGCCCCGACCGCGGACTTCTGCCGCGAGCACG 3162 0 
TVHYGTDDPAPTADFCREHA 

31621 CCGTCACCGGCTACGTGCTGCTGCTCGCGGCCCTCGCCTGCCTGGTCGCCCGGTACACCG 31680 
VTGYVLLLAALACLVARYTG 

316 81 GCCGGACGGACGTGGTGATCGGCTCAGCCGTCGGACTGCGCGAGGACCCCGAAGGGCTCG 31740 

RTDVVIGSPVGLREDPEGLA 

3 1741 CCACCGTCGGCCCGATGCTCAACCTGCTGCCGCTGCGCCTCCGGCTGCACGGCGACCCCG 3180 0 
TVGPMLNLLPLRLRLHGDPG 

318 01 GCTTCGGCGAGGTCCTGGCCCGCACCCGGGAGACGCTGCTCGGCGCGCTGGAGCACCGCA 3186 0 
FGEVLARTRETLLGALEHRT 

318 61 CCACACCGTTCGAGGACATCGTCGACGCGGTGGGCGCCGACCGGGACCCGGACGTCAGCC 31920 
TPFED IVDAVGADRDPDVS P 

3 1921 CCCTCTTCCAGATCCTCTTCGCCCACGAACGCCCCCCGGCCCCACCCGCGTTACCGGGCG 3198 0 
LFQILFAHERPPAPPALPGV 
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31981 TCCGTGCCCGCGTCGTACCCGTCCCCGCTCCGGCCGCCAAGTACGAGCTCGCCGTCACCG 3 2 040 
RARVVPVPAPAAKYELAVTA 

32 041 CCACCGAGACGCCCGACGGGCTCCGGCTGATCGTCGAGGCGGAGCACGGACACGGGGAAC 32100 
TETPDGLRLIVEAEHGHGEP 

32101 CGGCCGAACTCGCCGCCTTCGCCCGCCACTTCGGCGTCCTGCTGGCCGCCGGGGTCCGCG 32160 
AELAAFARHFGVLLAAGVRA 

32161 CGCCGGACACACCGCTGAGCCGCCTGCCGCTGCTCACCGACGAGGAGCGGCGCCGGCTCA 32220 
PDTPLSRLPLLTDEERRRLT 

32221 CCGACACCACGGCCCCCCGCACCGCGCCGGAGGCCCCCTACCGCCCCCTGCACCGGCTGG 322 80 
DTTAPRTAPEAPYRPLHRLV 

32 281 TCGAGGAGTCCGCCGCCCGCCGGCCCGACGCCCTGGCGGTCGTCGGCGGCACGCGTCACC 3234 0 
EESAARRPDALAVVGGTRHL 

32 341 TCAGCTACCGGGAGCTGAACTGCCGCGCCAACCGGCGTGCCGCCTGGCTGCGCCGCGCTG 32400 
SYRE LNCRANRRAAWLRRAG 

32401 GCATCGGCACCGAGGACGTGGTCGGCGTCCGGCTGGAACGCGGCCCGGAACTCCTCGTCT 32460 
IGTEDVVGVRLERGPELLVS 

32461 CGCTCCTCGCCGTCCTCAAGGCCGGCGCCGCCTACCTGCCCGTCGACCCGGCGCTGCCCG 32 52 0 
LLAVLKAGAAYLPVDPALPA 

32 521 CCGAGCGGGTACGGCTGATGCTCGACGACGCCCGGGCCGCGCTGCTGCTCACCGAGACCG 32 5 80 
ERVRLMLDDARAALLLTETA 

32581 CGCTCGGCACCCCGCCGGCCCCGGCCGGCACCCCCGTGCACCACGTGGACGGACCGCCAC 32640 
LGTPPAPAGTPVHHVDGPPP 

32 641 CGCCGACCCGGCCCGGGGACGACGCCGACCACACCGGCCCCGACCTGCCCACCAGCCTCG 3 27 00 
PTRPGDDADHTGPDLPTSLA 

32 7 01 CCTACCTCCTCTACACCTCCGGGTCGACGGGCCGGCCCAAGGCCGTGGCCCTCCAGCACG 3 27 60 
YLLYTSGSTGRPKAVALQHD 

32761 ACAGCGCCGCGGCGTTCCTGCGCTGGGCGGGCCGCGCCTTCGACGGCGGGGAGCTGGCCG 32820 
SAAAFLRWAGRAFDGGELAA 

32821 CCGTCCTGGCCACCACCTCCGCCGGCTTCGACCTGTCGGTCTTCGAGCTGTTCGCCCCCC 3 28 80 
VLATTSAGFDLSVFELFAPL 

32881 TGGCCCACGGCGGCACCGTCGTCCTCGCCGACAGCGCCCTGCACGTGCCCGCCCTGCCCT 3 2940 
AHGGTVVLADSALHVPALPW 

32 941 GGGCGCCCGCGGCGACGCTCCTGAACACCGTGCCCTCCGCGGCCGCCGCCCTGCTGGACG 33000 

APAATLLNTVPSAAAALLDA 

33 001 CCGACGGCCTGCCCGACGGTCTGACGGCCGTCAACCTGGCGGGCGAGCCCCTGACCGCGG 33 060 

DGLPDGLTAVNLAGEPLTAE 

33 061 AGCTGGTCGCCCGGCTGCACGCCCGCCTGCCGAAGGCCGCCGTCCGCAACCTCTACGGCC 3 312 0 
LVARLHARLPKAAVRNLYGP 

33121 CCTCGGAGGCCACCACCTACGCCACCGCGGCCCTCGTGCCCGCGGGCGGCACCGAGGCGC 3 318 0 
SEATTYATAALVPAGGTEAP 

33 181 CGGCCATCGGCCGGGCGCTCGGCGCGGCCCGCGTGTGGACCGCCGACGACCGGCAGCGCC 33240 
AIGRALGAARVWTADDRQRP 

33241 CCCTCCCCGGCGCGGTCGTCGGTGAACTCCTCATCGGCGGTACGGCCCCGGCCCGCGGCT 333 00 
LPGAVVGELL IGGTAPARGY 

3 3 3 01 ACCTCGGCCGGCCGGGACCGACCGCCGACGCCTTCCGGCCCGATCCGACGGGACCGCCCG 3 3 3 60 
LGRPGPTADAFRPDPTGPPG 

3 33 61 GCTCCCGGCTCTACCGCACCGGGGACCTGGCCGTACGCCGCCCCGACGGCCGGTTCGTGT 3 3420 
SRLYRTGDLAVRRPDGRFVF 

3 3421 TCCTCGGCCGCAAGGACGAGCAGATCAAACTCCGCGGGGTGCGCATCGAACCGGGCGAGG 334 80 
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334 81 TGGAAGCCGCTCTCCGCCAGTGCGCGCCGGTCGCCGCGGCCGCCGTCGTGCTCGCCGGGA 
EAALRQCAPVAAAAVVLAGT 

33 541 CCACCGCGGAGAACCACCGCCTCGTCGGCTTCGTCACCCCTTCGCCCGGCGCCCGCGTCG 



33 601 ACCCCGAGCGCACCCTCGCCGCGCTGCGTTCGCGCCTGCCCGCCGCCCTCGTGCCCGCCG 



33 661 CGCTGGTGGTGTGCGACGCCCTGCCGCTGACCGCCAACGGGAAGACCGACCGGGCCGCCC 



3 3 721 TCGCCCGGCGGGCGCGCGGACACCGGCCGGACCACGGCGCGTACGCCCCGCCCCGCACCC 



33781 GCGTCGAGAAGGCGGTCGCCGCGATCTGGCGCGAGGTGCTCGGGACCGAACGGGTGGGGA 



33 901 GGCTGGTCGCGTCCGTCCATCCCGGCCTCCGGCTCGCCGACGTCTTCCGGCTGCCGACCG 33 960 
LVASVHPGLRLADVFRLPTV 

33 961 TCGCCGCGCTCGCCGCGTTCGTGGACGGGCAGGAGGACGCGCGCGAGACGGCCGTCGGCG 3 4 02 0 

AALAAFVDGQEDARETAVGD 

34021 ACGCGGCCCTCCGGGCCGGCCGGCGCCGCGCCGCGGTGGCCGCGCGCCGCAGGAAAGGCG 34 080 
AALRAGRRRAAVAARRRKGG 

34 081 GCGGACGATGAGCCATGCCGACGCGGGCGACGGGCTCGACGCGGCTGACACGACTGACGC 3414 0 

MSHADAGDGLDAADTTDA (orf24) 
G R * 

34 14 1 GGCCGACGGGATCGCCGTGATCTCGCTGGGCGGACGCTTCCCCGGAGCGGACCGGGTGGA 342 00 
A D G I A V I SLGGRFPGADRVD 

34201 CCGCCTCTGGACGAACCTGCTCGACCGCGAGGACGCCATCAGCCACTTCACCGCCGACGA 34260 
RLWTNLLDREDAISHFTADE 

34261 ACGCCTCGCCCGGGGCCGCGACCCCGAACTGGTGCGCCACCCGCGGTTCGTCGGCGCGGA 34320 
RLARGRDPELVRHPRFVGAE 

34321 AGGCGTCCTCGGCGACGTCTCCCTCTTCGACGCCGAGTTCTTCGGCTGCTCGCCGCGCGA 34380 
GVLGDVSLFDAEFFGCSPRE 

34 3 81 GGCCGAAGTCATGGACCCGCAGCACCGGCTCTGCCTGGAGGAGGCGTGGCACGTCTTCGA 34440 
AEVMDPQHRLCLEEAWHVFD 

34441 CACCGCCGGCTACGACCCGGCGGCGACGGGCACCGCGGT CGGGGTGTTCCTCTCCGCGAG 345 00 
TAGYDPAATGTAVGVFLSAS 

345 01 CCTCAGCTCGTACCTGATCCGCAACGTCCTGCCCGGCGGCGCGGCACAGCGCCTGCTCGG 3 45 60 



34 621 ACTGGGCCTCACCGGGCCGAGTTACGCCGTCGGCTCGGCCTGCTCGTCCTCCCTCGTCGC 34680 
LGLTGPSYAVGSACSS SLVA 

34681 GGTGCACCTGGCCTGCCAGAGCCTGCTCACCGAGGAATGCGACATGGCGCTGGCCGGCGG 34 74 0 
VHLACQSLLTEECDMALAGG 

34800 

ACCCGACGGGCGCTGCGCCCCCTTCGACGCCGGCGCGGCGGGCACGGTGGGCGGCAGCGG 34860 
PDGRCAPFDAGAAGTVGGSG 

34920 
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34921 CGCGGTGATCCTCGGCTCGGCGGTGAACAACGACGGCGC CGACAAGGTCGGTTACACGGC 34980 
AVILGSAVNNDGADKVGYTA 

34981 GCCCGGCGTCACCGGCCAGAGCGCCGTCGTCGCCGAGGCCCTGGCGGTGGCCGGGATCTC 3 5040 
PGVTGQSAVVAEALAVAGI S 

35 041 CGCCGCGACCGTCGGCGTCCTGGAGGCGCACGGCACCGGCACCCGGCTGGGCGATCCCGT 35100 

AATVGVLEAHGTGTRLGDPV 

35101 CGAAGTGGCCGCGCTCACCCGGGCGTTCCGCGCCCACACGGACCGCAGCGGCTTCTGCGC 3 5160 
EVAALTRAFRAHTDRSGFCA 

35161 GCTGGGCTCGGTGAAGGCCAACGTGGGCCACCTGGACGCGGCGGCGGGCGTCACCGGGCT 3 5220 
LGSVKANVGHLDAAAGVTGL 

35221 GATCAAGGCCGTGCTGGCGGTCCGCGAGGGCGTCATCCCCGGCACCCCGCACTACCGTTC 352 80 
I KAVLAVREGVI PGTPHYRS 

3 52 81 GCCCAACCCCGCCATCGACTTCGCCACCACACCCTTCTACGTCACCGCCGACACCCTCGC 3 534 0 
PNPAI DFATTPFYVTADTLA 

35341 CTGGCCGGAGGCGGACCACCCCCGCCGGGCCGGCGTCAGCTCCTTCGGCATCGGGGGCAC 35400 
WPEADHPRRAGVSSFGIGGT 

3 5401 CAACGCCCACGTGATCCTGGAACAGGCCCCGCCGGCCGCCCCCCGCGCGGACCGGACCGC 35460 
NAHVI LEQAPPAAPRADRTA 

35461 CGGGGTGCCCATGCCGTTGGTGGTGTCCGCCCGCACCCGCGAAGCACTGGCGGAGGCCGT 3 5520 
GVPMPLVVSARTREALAEAV 

3 5 521 CCGGGACCTGGCGGCGTGGTCGGCCCCGGAGCCGGGGACCCGGCTCGCCGATCTCGCCGC 3 55 SO 
RDLAAWSAPEPGTRLADLAA 

3 5 581 CACGCTGGCCGGGCGCCGGGCCTTCCCGTACCGCGCCGCCGTCGTGTGCCACGACCTGCC 3 564 0 
TLAGRRAFPYRAAVVCHDIiP 

3 5 641 CGAGGCCGCGCGCCTGCTGGGCGGCGCGCGCGGCGAGACCGCGCTCCCCGGCAGGGAGGC 35700 
EAARLLGGARGETALPGREA 

357 01 CGTGTTCCTCTTCCCCGGGCAGGGCACCCTCCCGCCGGACACCGGGCGCGGCCTGTACGC 3 5760 
VFLFPGQGTLPPDTGRGLYA 

3 5761 GGACGTGCCGGCGTTCCGCGCCCACTTCGACGCCTGTGCCGAAGGGTTCGCCCCGCTCGG 3582 0 
DVPAFRAHFDACAEGFAPLG 

35821 CACCGACCTCCACGCCGCGCTCGGGGCCCCGGCCGACGACACCAGGGCCGCGCAACCCGC 35880 
TDLHAALGAPADDTRAAQPA 

35881 CCTCTTCGCCGTCGAGTACGCCCTCGCCCGCACCCTGATGGACTGGGGTGTGCGCCCGGC 3 594 0 
LFAVEYALARTLMDWGVRPA 

35941 CGCGATGCTCGGGCACAGCCTCGGCGAGTACGTCGCGGCGACGCTGGCCGGGGTGCTGTC 36000 
AMLGHSLGEYVAATLAGVLS 

36 001 CCTGCCGGACGCGCTGACGCTCGTCCGGGCCCGGGCGGAAGCGCAGCACACCATGCCGCC 3 60 60 

LPDALTLVRARAEAQHTMPP 

36 061 CGGCCGCATGCTCGCGGTCCCGCTCACGCCGGACGACCTGCGCCCGCTGCTGCCCCCGGA 3 6120 
GRMLAVPLTPDDLRPLLPPE 

3 6121 GGTGGAGTTCAGCGCCTTCAACGCCCCCGGCCGCTGCGTCGTCGGCGGGCCCCCGGAGCC 3 6180 
VEFSAFNAPGRCVVGGPPEP 

36181 GGTGGCGGAGCTGCGCGCCCGGCTGGCGCGGCGCGGAGTGCCGGCCGCCGAACTGGCCAC 3 6240 
VAELRARLARRGVPAAELAT 

36241 CGCGCACGCCTTCCACTCGGCGGCCGTCGAACCGCTGCTGGACGGCTTCCGGGGCGTGCT 3 63 00 
AHAFHSAAVEPLLDGFRGVL, 

36301 GGAAGGCGTCCGACTGCGGCCGCCCCGGCTGCGGTACGTGTCCTCCCTCACCGGCGACTG 3 63 60 
EGVRLRPPRLRYVS SLTGDW 

363 61 GGCCGACGCCGCGGTCACCACCCCCGCGTACTGGCTCGCCCACCTGCGCCGGCCCGTCCG 3 6420 
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ADAAVTTPAYWLAHLRRPVR 

36421 CTTCGCCGACGGCCTGCGGCGCTGCCTGGACCTCGGCCCCGTCGCCCTGGTCGAGACCGG 36480 
FADGLRRCLDLGPVALVETG 

3 6481 GCCGCGGGCCGGACTGACCGGCCTGGCCCGCCGCGCCGCGGGCCCCGGCGAGCCCCCTTA 36540 
PRAGLTGLARRAAGPGEPPY 

3 6541 CACCGTCCGCTGCCTGGCCGCCCCCGACGAGGCGGCTTCGCTGACCCACGCGGTCGCCGT 3 6600 
TVRCLAAPDEAASLTHAVAV 

3 6601 ACTCTGGCGCTCGGGCTGCGCCGTCGACTGGACGGCGTTCCACCGCCCCGGGCGCCCCCG 3 6660 
LWRSGCAVDWTAFHRPGRPR 

3 6661 CCGCACCACCGTGCCCGGCTACCCCTTCCAACGGGTACGGCACTGGATCGACGCGCCGGA 3 672 0 
RTTVPGYPFQRVRHWIDAPD 

36721 CGAGTCCGAACCCACGGACCTCGCCACCGCCCTGCGCGCGGAGTTGCGGACGGACGGCGA 36780 
ESEPTDLATALRAELRTDGD 

36781 TCCGCCGCTCGCCGTCGATCAGCGGCCCGGACTGCGCACGGGGCTGAACCGGCTGTGCGC 3684 0 
PPLAVDQRPGLRTGLNRLCA 

36841 CGCCCTGGCCCGCGACTACCTGGCCACCGGCGTCGAAGCGAGCGGGGTCCTGCCCGGATT 3 690 0 
ALARDYLATGVEASGVLPGF 

3 6901 CCACCGCTTCCTGGACTACCTGCGCACCCTGGCCGCCTCCGCACCGGCCGCGGACGACGC 3 6960 
HRFLDYLRTLAASAPAADDA 

3 6961 GGGGACGATCGCCGCGGAGATCACCGCGGCCCACCCGTCCTTCTCCGGGCTCGTCGACCT 37020 
GTIAAEITAAHPSFSGLVDL 

37021 GCTCCGGCACTGCGCCCAGGGCTATCCGCGCGCCCTGTCCACCCCCGGAGCCGCACTGGA 37080 
LRHCAQGYPRALSTPGAALD 

3 7081 CGTCCTCTATCCGGCCGGCAGCGGCGACCTCCTGCGCCGCACCCTGGGCGAGGGCACCGC 3 714 0 
VLYPAGSGDLLRRTLiGEGTA 

3 7141 CGACCACCGCGCCACCGGCCGCCTCACCCGGCTGGCCGGCTCCCTGCTCGACCGGCTCGC 3 7200 
DHRATGRLTRLAGS LLDRLA 

3 7201 GGCCGACCGCGAACCCGGCCGCCCGCTGCGCGTCCTGGAGGCCGGAGCGGGCGCGGGCAG 37260 
ADREPGRPLRVLEAGAGAGS 

3 7261 CCTCACCCAGGCCCTGGTCACCCGGGCCCCCGGCCGGCTCGACTACCACGCCACCGACAT 3 7320 
LTQALVTRAPGRLDYHATD I 

37321 CTCCCGGCACTTCGTGACCGCACTCGGCCGGGAGGCCGCCCGGCGCGGCCTGGACTTCGT 37380 
SRHFVTALGREAARRGLDFV 

3 7381 CCGCGCACGCGTCCTCGACATCGCCCGCGACCCAGGCGAACAGGGCTTCGCCGGCGAGCG 37440 
RARVLD IARDPGEQGFAGER 

3 7441 GTTCGACGTCGTCTGCGGCCTCGACGTGGTCCACGCCACCCCCGACCTGCGCACCACGCT 37500 
FDVVCGLDVVHATPDLRTTL 

37501 CGGCCATCTGCGCTCCCTGATGGCACCGGACGGCACCCTCGCGCTGATCGAGACCACCGC 37560 
GHLRSLMAPDGTLALIETTA 

3 7561 CGACGACCCCTGGCTGACGATGATCTGGGGCCTGACGGACGGCTGGTGGCACCACACCGA 3 762 0 
DDPWLTMIWGLTDGWWHHTD 

3 7621 CCGGCGCACCCACGGCCCGCTGCTCGACGCCGCCGGCTGGCGCGCCCTCCTGGCCGGCGA 3768 0 
RRTHGPLLDAAGWRALLAGE 

3 7681 GGACTTCGCCACGGCCGATGTGATCGTGCCGCCCGACGGCCCCCAGGACGCGGCCCTGCT 37740 
DFATADVIVPPDGPQDAALL 

3 7741 GCTCGCCCGGCAGACCCCCCGGCCGGCGGCGGCCGCACCGTCCGTCGGCAAGCGGGACGT 3 78 00 
LARQTPRPAAAAPSVGKRDV 

3 78 01 CGGCACGTGGTGCTACGCCCGCGGCTGGCGGCACGCCGCGCCCGCCGACCCCGCCCCGCT 3 7860 
GTWCYARGWRHAAPADPAPL 
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37861 GACGGGCGGCTGCCTGCTGCTGGGCGACGGGGACACGGCGAAGGCCGTCGCGAGCCGGCT 3792 0 
TGGCLLLGDGDTAKAVASRL 

3 7 921 GGAGGCCCTCGGCGTGCCCGTCACCACCGTCGGCGGCGGCCGACCGCCGGGCCCCGAGCG 3 7980 
EALGVPVTTVGGGRPPGPER 

3 7981 GTACCGGGAACTCGTCGGCCCCGCCACCCGCCTGGCCGTCGACCTGTGGCCGCTGCGCGA 38040 
YRELVGPATRLAVDLWPLRD 

38041 CGCGTCCCACCGCGGCCGCGCCGCCGGCGCCGCCGGCGTACGGACCGCCCAGGACGCCGC 38100 
ASHRGRAAGAAGVRTAQDAA 

3 8101 GCTGCACAACCTGCTCCACCTCGCCCGGGCCTTCGGCGCGCTGGAGGAGCGCCACCCCGC 3 8160 
LHNLLHLARAFGALEERHPA 

3 8161 CCGCGTCGTGACCGTGACCACCGGTGCCCACGACGTGCTCGGCGACGACCTCGCCCACCC 3 822 0 
RVVTVTTGAHDVLGDDLAHP 

3 8221 CGAGCACGCCACCGTCCCGGCCGCGGCCAAGGTGATCCCCCGGGAGTACCCGTGGATCGC 38280 
EHATVPAAAKVI PREYPWIA 

3 8281 CTGCACCGCCCTGGACGTGGAGCCGGGCCTGGACGCCGAGCGGCTGGCGGACCTGATCGT 3834 0 
CTALDVEPGLDAERLADLIV 

38341 CCGGGAACTCGGCGCGGCGCGCGAGACCACCGTCACCGCCTGCCGCGGCCGACGCCGCTT 38400 
RELGAARETTVTACRGRRRF 

38401 CACCCCCTGCCCCGTCCGGCAGCCCCTCCCCGCCGCACCGGAACGCCCGGCGGTCCGGCC 38460 
TPCPVRQPLPAAPERPAVRP 

38461 CGGCGGCGTCTACCTCGTCTGCGGCGGCCTCGGCGGCATCGGCCTCCACCTCGCCGAGTA 3 8520 
GGVYLVCGGLGGIGLHLAEY 

38 521 CCTGGGCCGCGCCCGCACCACCGTCGTCCTCACCCACCGGCGGCCCTTTCCCGCCCCCGG 3 858 0 
LGRARTTVVLTHRRPFPAPG 

38581 CGCGTGGGACGGGCTGCCCGCGGGACACCCGGAGGCGGCCGTCGTCCGGCGGCTGCGCTC 3 864 0 
AWDGLPAGHPEAAVVRRLRS 

38 641 CCTCGCCGCCACCGGCGCCACGGTCGTCGTCCGCCGGGCCGACCTCACCGACCACGACGC 3 8 700 
LAATGATVVVRRADLTDHDA 

3 87 01 GATGCGCGCCCTCGCGGACGAGGTGGAACAGGCCCACGGCCCCGTCCGGGGGGTGGTGCA 3 87 60 
MRALADEVEQAHGPVRGVVH 

38761 CGCGGCCGGGGTGCCCGACACCGCCGGCATGATCCAGCGTCGCGACCGAGCCGGCACGGA 3 882 0 
AAGVPDTAGMIQRRDRAGTD 

38 821 CGCCGCCCTCGCCGCCAAACTGACCGGCACCCTCGTCCTGGACGAGGTGTTCGCCCACCG 38880 
AALAAKLTGTLVLDEVFAHR 

3 8 881 CGACCTCGACTTCCTCGTCCTGTGCTCCTCGATCGGCACCGTGCTGCACAAGCTGAAGTT 38940 
DLDFLVLCSSIGTVLHKLKF 

3 8941 CGGCGAGGTCGGCTACGTGGCGGGCAACGAGTTCCTCGACGCCTATGCCGCCCACCGCGC 3 90 00 
GEVGYVAGNEFLDAYAAHRA 

3 90 01 GGCCCGCCGCCCCGGCAGAACCCTGTCGATCGCCTGGACCGACTGGCGGGAGTCGGGCAT 3 90 60 
ARRPGRTLS IAWTDWRESGM 

3 9061 GTGGGCCGCCGCCCAGCGCCGTCTGACCGAGCGCTACGGCACCGGCGCCGACCTGCCCGT 3 9120 
WAAAQRRLTERYGTGADLPV 

3 9121 ACCGCCCGGGGGCGACCTGCTCGGCGCGATCAGCCCCGAGGAGGGCGTCGACGTCTTCGC 3 918 0 
PPGGDLLGAI S PEEGVDVFA 

3 9181 CCGGCTGCTCGCCGCCGACACCGGCCCGAACGTCATCGTGTCGGCCCAGGACCTCGACGA 3924 0 
RLLAADTGPNVIVSAQDLDE 

39241 ACTCCTCGCGCGGCACGCGGCGTACACCACCGACGACCACCTCGCCGCCCTCGGCGACCT 39300 
LLARHAAYTTDDHLAALGDL 

3 93 01 GAGGATCGCCGCCGCCCGGGACCGCTCCGCGCCCGCCGCGCCGTACGCGGCCCCCCACAC 3 9360 
RIAAARDRSAPAAPYAAPHT 
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39361 GCCCGCCCAGCGGCGGATCGCCGGCTGGTACCGCGACCTGCTCGGCGTCGAACACGTCGG 3 9420 
PAQRRIAGWYRDLLGVEHVG 

39421 CCTCGACGACGACTTCTTCGCGCTCGGCGGGGACTCGCTGCTCGCCCTGCGCCTGCTGTC 39480 
LDDDFFALGGDSLLALRLLS 

394 81 GCAGCTGCGGGACGCCTACGGGGTGGAGATCTCCGTCGCCCGCATGTTCGACGAGCCCAC 3 9540 
QLRDAYGVE ISVARMFDEPT 

3 9541 GGTGGCGGCGCTGGCCGCCGCCACCGGCCCGCCGCCGGAAGAGACGCCCGGCCAGGAAGA 39600 
VAALAAATGPPPEETPGQEE 

39601 GGTGGTGCTGTGACCACGCCCCGCATCACCGACCTGCTCACCGAGCTCCGCGGCCGGCAG 3 9660 
V V L * 

MTTPRITDLLTELRGRQ (orf23) 

39661 GTGACCCTCACGGCCGACGGGGACCGGCTGCACTGCCGCGCGCCCCGGGGCGCGCTCACC 39720 
VTLTADGDRLHCRAPRGALT 

3 9721 GACGAGCTCCTCGCCACCATCCGCGCCCGCCGCGACGAACTCCTCGCCCACCTGCGCGCC 39780 
DELLATIRARRDELLAHLRA 

39781 GACCGCCGCATCCCGCGCCACGACGGGCCCGCGCCGCTGTCCTTCGCCCAGGAACGGCTC 39840 
DRRI PRHDGPAPLSFAQERL 

3 9841 TGGCTCCTCCACCAGTTCCACCCGCACGACAGCGCCTACAACATCCCCCTGCACATCGCC 3 99 00 

WLLHQFHPHDSAYNIPLHIA 

399 01 CTGCGCGGGCCCCTGAACCCGGCCGCCCTGCGCGCCGCCCTGGCCGAGGTGGTACGGCGG 3 99 60 
LRGPLNPAALRAALAEVVRR 

39961 CACGACGTCCTGCGCACCCGGTACGCCATCAGCCGCGGCCTGCCCCGGCCCGTCGTCGAA 4 0020 
HDVLRTRYAISRGLPRPVVE 

4 0021 CCGGCCCACACGCCGCCGCTGCCCCTGACCGACCTGACCGGGCTCCCCGCACACCACCGG 4 0 0 80 

PAHTPPLPLTDLTGLPAHHR 

4 00 81 GACGCCGAACTCGCCCGGCTGGCCGCCCAGGAGGCCAGGCGGCCCTTCGACCTCGCCCAG 4 014 0 
DAELARLAAQEARRPFDLAQ 

4 0141 GGCCCGGTGCTGCGGGCCCGGCTCCTCCGAACGGCCCCCGAGGAGCACCGGCTGCTGCTG 4 0200 
GPVLRARLLRTAPEEHRLLL 

40201 ACCCGCCATCACATCGCCAGCGACGGCTGGTCGCTCGACATCCTGCTCCGCGAACTGGGC 40260 
TRHHIASDGWSLDILLRELG 

4 02 61 ACGTTCTACCGGGCAGGGCGGGACGGCACACCCGCCGGCCTCGACGCCCTGCCGCTGCGG 4 0320 
TFYRAGRDGTPAGLDALPLR 

4 0321 TACGCCGACTTCGCCGCGTACCAGCGCGAACAGGCCGAACGGCCGGAGACGGCCGAGCGG 4 03 8 0 
YADFAAYQREQAERPETAER 

40381 TCGACCCGCTGGGCACGGCACCTGAGGGGCGCCCCCGCGACACTCGACGTCCTCGGGCCC 40440 
STRWARHLRGAPATLDVLGP 

40441 CCGCCCGCCGAACCCTCCCACGCGCCGGCCGGCACCGTACGGACGGACCTTCCCGCCGCC 40500 
PPAEPSHAPAGTVRTDLPAA 

40 501 CTCGTCACCGGCCTGCGGCAGCTGGGCGGCCGGGCCCGCACCACGCTCTTCCCGCTCCTG 4 05 60 
LVTGLRQLGGRARTTLFPLL 

40561 CTGAGCGCCTTCGGCCTCGCCCTGGCCGGCCCGCCCGGCCCGTACGACGTCATGGTCGGC 4 062 0 
LSAFGLALAGPPGPYDVMVG 

40 621 ATCCCCGTCGCCGGCCGGCCGCGCACCGAACTGGAGCCGCTCATCGGCTGCTTCGCGACC 4 068 0 
IPVAGRPRTELEPLIGCFAT 

40 6 81 ATCGCGCCGATGCGGCTGACGAGCGACGGGACCGAGCCGCTGACCCGGCTCGCCGCCCGC 4 0 74 0 
IAPMRLTSDGTEPLTRLAAR 

40741 GCCCAGCAGCACGTCCAGGACGCGCTGGACGGACCCGACGTCCCCTTCGAGCGGCTCGTG 40800 
AQQHVQDALDGPDVPFERLV 



15 




4 0 8 01 CACGCGCTGCGTCCGGAGCGGGACCTCGCGGAGAACCCCCTGTTCTCGGCGTCGTTCGCC 4 0860 
HALRPERDLAENPLFSASFA 

4 0861 TTCCAGAACACCCCGCGGACCGCCGTGCGCCTCCCCGGCCTGGACGCCGAGGTGCTGCCC 4 0920 
FQNTPRTAVRLPGLDAEVLP 

4 0921 TCGCCGCCCGTGGCCCCCAAGTTCCCGCTGGCCCTCACCGCGACGGCGCGGGCCGACGGC 40980 
S P PVAP KF PLALTATARADG 

4 0 981 GGAATGGGCCTGGAGCTGGAGTTCGACCGGGACCGGATCGCCGAGCCGGTCGCGCGGGGG 41040 
GMGLELEFDRDRIAEPVARG 

41041 ATCCTCACGTCCTTCCACGCCGCCCTCGCCCGCGCGGTCGCCGACCCCGAGGCCCCGGCG 41100 
I LTSFHAALARAVADPEAPA 

41101 GCGCCCGTACCGGCCGCCGCCGTGGACCGGCGGCCCGGGCGCGAAGGACACGAGTGCCTC 41160 
APVPAAAVDRRPGREGHECL 

41161 CACGAGCCGGTGGCGCGGGCGGCGGCACGCCACCCCGACGCCGTCGCCGTCAGCTGCGGC 41220 
HEPVARAAARHPDAVAVS CG 

41221 GGCACCCAGCTCAGCTACGGGGCGCTCGACACCCGCGCCGAACGGCTGGCCGCGGTGCTG 412 80 
GTQLSYGALDTRAERLAAVL 

412 81 CGCGCCCACGGCGCCGGCCCCGAGCGGCTGGTGGCCCTGTGCCTGCCCACCGGCCCCGAA 41340 
RAHGAGPERLVALCLPTGPE 

41341 TGGGTCGTCGGCGCCCTCGCCATCCTCAAGTCCGGCGCCGCCTACCTGCCGCTCGACCCC 414 00 
WVVGALAILKSGAAYLPLDP 

414 01 GGCGACCCGGCCGAGCGCCGCGCCTCCGTCGCCGCCGACGCGGGAGCGACGCTGATCGTC 414 60 
GDPAERRASVAADAGATLIV 

414 61 TCCGACACCGCGCTTCCCCCGCTCCACCGCGTCGACGTCACGGCCACCCTCCCGGACGGC 41520 
SDTALPPLHRVDVTATLPDG 

41521 GCCCCCGAGCCCACCGCCCGGGCCGTCCTGCCCGGCAACCTCGCCTACGCCGTCTACACC 41580 
APEPTARAVLPGNLAYAVYT 

41581 TCCGGCTCCACCGGCGGCCCCAAGGGCGTGCTCGTCACCCATGCCAACGTCACCGGGCTC 41640 
SGSTGGPKGVLVTHANVTGL 

41641 CTGGCCGCGTGCCGTGAGGCCCTGCCCGCCCTGGACGCCCCCCGGACCTGGTCGGCGACC 417 00 
LAACREALPALDAPRTWSAT 

417 01 CACTCGCCGGCCTTCGACTTCTCCGTCTGGGAGGTCTGGGGCCCGCTGACCGCCGGCGGA 41760 
HSPAFDFSVWEVWGPLTAGG 

41761 CGCCTCGTCCTCGTGCCCCCGGACGTGGCCCGGGCCCCGGACGAACTGTGGGACACCCTC 41820 
RLVLVPPDVARAPDELWDTL 

41821 CGCGACGAACAGGTCGAAGTCCTCAGCCAGACCCCCAGCGCGTTCCACCACCTCCTGCCC 418 8 0 
RDEQVEVLSQTPSAFHHLLP 

41881 ACCGCCGTGCGCCGGGCGGCCCAGGCCACCGCGCTCGAACTCGTCGTCCTGGGCGGCGAG 41940 
TAVRRAAQATALELVVLGGE 

41941 GCGTGCGAGCCCGCCCGTCTGACGCCTTGGTGGGACGCCCTGGGCGACCGGCGCCCGGCC 42000 
ACEPARLTPWWDALGDRRPA 

42001 GTGGTCAACATGTACGGCATCACCGAGAACACCATCCACGTCACCGTCCGCCGGATGACG 42 060 
VVNMYGITENTIHVTVRRMT 

42061 GCGGCGGACCGGTCGGGCAGTCCCGTCGGCCGGCCGCTGCCGGGGCAGCGCGCCGACCTT 42120 
AADRSGSPVGRPLPGQRADL 

42121 CTCGACCCCCACGGCCGGCCCGTCGCGCCGGGCGGGCGGGGCGAACTGTTCGTCGGCGGC 4218 0 
LDPHGRPVAPGGRGELFVGG 

42181 GTCGGACTGGCCCGCGGCTACCTCGGCCGGCCCGGCCTCACCGCCCGGAGCTTCCTGCCG 42240 
VGLARGYLGRPGLTARSFLP 

42241 GACGACACCCCCGGCTGGCCGGGCGCGCGCCGCTACCGCTCCGGAGACCTGGCCCGGCTG 42300 
DDTPGWPGARRYRSGDLARL 
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42 301 CTGCCCGACGGCGGCCTGGACTACGCGGGCCGCTCCGACGCACAGGTCAAGGTCCGCGGC 42360 
LPDGGLDYAGRSDAQVKVRG 

423 61 TACCGCGTCGAGCCCGCCGAGACCGAAGCCGCCGCGCTGACCCATCCCGCCGTGCGCCAC 42420 
YRVEPAETEAAALTHPAVRH 

42421 TGCGTGGTCGTGCCACGCGGCGACGGCGACCGGCGCCATCTCGCGGCGTACGTCGTCGCC 42480 
CVVVPRGDGDRRHLAAYVVA 

42481 GACACCCGCGCCTGCGACGGGCCCGGGCTCCGCACCCACCTGGCCGAGCGGCTGCCCCGC 42540 
DTRACDGPGLRTHLAERLPR 

42541 CACCTGGTGCCGGCCTCGGTGGTCTTCCTGAAGCGGATCCCGCTGACCCGCAACGGCAAG 42600 
HLVPASVVFLKRI PLTRNGK 

42 601 CTCGACGTGGCGGCCTTGCCCGACCCGGCCGCCCACCGCGCACCCGCCCGCGAACGCCCG 42660 
LDVAALPDPAAHRAPARERP 

42661 CGCACCGCGACCGAACGGACCCTCACCCGGCTGCTCGCCGCCCTCCTGAAGGCGCCACCG 4272 0 
RTATERTLTRLLAALLKAPP 

42 721 GAGACCATCGGGACGCACGACAACCTCTTCGACCTGGGCGGCGACTCCCTGACGGTCACC 4 2 78 0 
ETIGTHDNLFDLGGDSLTVT 

42 781 CAGTTCCACTCCCGGGTGGTGGAGGAGTTCGCCGTGGACCTCCCGGTGCGCCGGGTCTAC 4284 0 
QFHSRVVEEFAVDLPVRRVY 

42 841 CAGGCCCTCGACATCGCGACGCTCGCCGTGACCGTGGACGACTTCCGGCGCCGCGCCGAA 42900 
QALDIATLAVTVDDFRRRAE 

42 901 CGCACCGCGGTACTGCGCGCCCTCGCGGCGGCGGAGGCGATGGAACCCGGCGGTACGGCG 42960 
RTAVLRALAAAEAME PGGTA 

42 961 GGGGAGT CCGGCGGTAATCCGGAGGAGTCCGCCGCTACGGCGCGGGGGCCCGCCGTCGCG 43 02 0 

GESGGNPEESAATARGPAVA 

43 021 GCGAACGAACCCGGCGCTGCGGCGCGTGAGTCCGGCGCCGCGCCGGTGGAGCCCGCCGTC 43 0 8 0 

ANE PGAAARESGAAPVE PAV 

43 081 GCAGTACAGGAGTCCGCCGCTACGAAGGGGGAGCCCGGCACCGCAGCGAATGAACTCGGC 4314 0 
AVQESAATKGEPGTAANELG 

43141 GCTGAGGCACGGGAGCCCGGCACCGCAGCGCAGGAACCCGGCACCGACCCCCGGCCACCC 4 32 00 
AEAREPGTAAQEPGTDPRPP 

43 2 01 GCCGCCACACCGCAGGACCCCCGCACCACACCGCAGGAAGGACAGCCGTGCCCGCGTCCC 4 32 60 
AATPQDPRTTPQEGQPCPRP 

43 261 GAATGAGCCGGCCGGCCGGCATCGTCGACATCGCGCGCCGTCACGCCGAGCGCACCCCCG 4 3 320 

MSRPAGIVDIARRHAERTPA (orf22) 
E * 

43321 CCCGTCCCGCGTACGCGTTCCTGCCCGACGGCGAGACGGAGAGCGTCCGCTTCTCCTTCG 4 33 80 
RPAYAFLPDGETESVRFSFA 

433 81 CCGACATCGACCGGCGGGCCCGCGCCGTGGCCGCCGTCCTCCAGGACCGCGGCCTGGCCG 4 3440 
DIDRRARAVAAVLQDRGLAG 

4 3441 GGGAGCGGGTCCTGGTCGCCTATCCCTCCGGGCCCGAGTACGTCCAGGCGTTCCTGGGCT 43 5 00 
ERVLVAYPSGPEYVQAFLGC 

4 3 5 01 GCCTGTACGCGGGCGTGGTCGCCGTCCCCTGCGACGAGCCGCGCTCCGGCCCGAGCGCGG 43 5 60 
LYAGVVAVPCDEPRSGPSAE 

4 3 561 AACGGCTCGCCGGGATCCGCGCCGACGCCCGCCCCGCCCTGGCCCTGACCGCCGGCGCCC 43 62 0 
RLAGIRADARPALALTAGAP 

43621 CCGAGGCCGGGCTCGCCGGCCTGGCCACCCTGGACGTGGCCGGCGTCCCCGACTCCGCCG 43680 
EAGLAGLATLDVAGVPDSAA 

4 3681 CCGGGGCCTGGACCGACCCCGTCGCGGGACCGGACGCCCTGGCCTTCCTCCAGTACACCT 43 740 
GAWTDPVAGPDALAFLQYTS 
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43 741 CCGGATCGACCCGCCGCCCCCGCGGCGTCATGGTCGGCCACGGCAATCTGCTGGCCAACG 438 00 
GSTRRPRGVMVGHGNLLANE 

43 8 01 AGCGCTGCATCGCCGCCGCCTGCGGCCACGACCGGGACTCCACCTTCGTGGGATGGGCGC 43860 
RCIAAACGHDRDSTFVGWAP 

43 8 61 CGTTCTTCCACGACATGGGCCTGGTCGCCAACCTCCTCCAGCCCCTCTACCTCGGGTCCC 4 3 920 
FFHDMGLVANLLQPLYLGSL 

43 921 TGTCGGTGCTGATGCCGCCGATGGCCTTCCTCCAGCGCCCGGCCCGCTGGCTGCGGGCCG 4 3 9 80 
SVLMPPMAFLQRPARWLRAV 

43 981 TCTCCCGCTACCGGGCGCACACCAGCGGCGGCCCCAACTTCGCCTACGACCTGTGTGTCG 44 040 

SRYRAHTSGGPNFAYDLCVD 

44041 ACCGGGTCGGCGAGGACGAGCGGGCCGGACTGGACCTGTCGGGCTGGAAGGTCGCCTACA 44100 
RVGEDERAGLDLSGWKVAYN 

44101 ACGGCGCGGAACCTGTACGGGCCGACACCCTGCGACGGTTCACCGACCGCTTCGCCCCCC 44160 
GAEPVRADTLRRFTDRFAPH 

44161 ACGGCTTCACCCCCGGCGCGCACTTCCCGACCTACGGGCTCGCCGAGGCGACCCTGCTCG 442 20 
GFTPGAHFPTYGLAEATLLV 

4422 1 TCGCCACCGGCCCCAAGGGAGTGCCGCCCCGCACCCTGACCGCCGACCGCGCCGCCCTGC 442 80 
ATGPKGVP PRTLTADRAALR 

442 81 GCGCCGGCCGGCTCCGGCCCGCCGGGCCCGGCGAGGCCGGCCTGGAACTGGTCGGCAACG 44340 
AGRLRPAGPGEAGLELVGNG 

44341 GCACCGCCGGCCTCGACACCACCCTCCGGATCGTCGACCCCGCGACCGCGCGGGAGTGCC 444 00 
TAGLDTTLRIVDPATARECP 

44401 CGCCCGGAGAGGTCGGCGAGGTCTGGGTGCGCGGCCCGGGCGTGGCACGCGGCTACTTCG 44460 
PGEVGEVWVRGPGVARGYFG 

444 61 GCCGCCCGCGCGAGTCCGCGCCGCTGCTCGCCGCCCGCCTGCCCGGCGGCGAAGGACCGT 4 4520 
RPRESAPLLAARLPGGEGPY 

44 521 ACCTGCGGACCGGGGACCTGGGCGCCCTGCACGACGGGGAACTCTTCCTCACCGGACGCC 44580 

LRTGDLGALHDGELFLTGRH 

44581 ACAAGGACCTCATCGTCATCCGCGGCCAGAACCACCACCCGCACGACCTCGAACGGACCG 44640 
KDLIVIRGQNHHPHDLERTA 

44 641 CCGAGCAGGCCCACCCGGCGCTCCGCCCGACCTGCGCCGCCGCGTTCGCGGTGCCCGGGG 4470 0 
EQAHPALRPTCAAAFAVPGD 

44701 ACGGCGCGGAGCGGCTCGTGCTCGTCTGCGAACTCACCTCCTACCGCGCCGTCGACCCGG 44760 
GAERLVLVCELTSYRAVDPA 

44761 CCGCCGTCGCCGAGGCCGTCCGGGCCGCGCTCGCCGCGCGGCACGGCGTCGCCCCGCACA 44 82 0 
AVAEAVRAALAARHGVAPHT 

44821 CGCTGGTGGTGCTGCGCCGCGGCGGCATCCCCAAGACCACCAGCGGAAAGGTGCGGCGCG 44880 
LVVLRRGGI PKTTSGKVRRG 

44 881 GCCACTGCCGGACGGCCTACCTCGACGGAACGCTCCCCGTTCACACGGCCGTCCGCCTCC 44 94 0 
HCRTAYLDGTLPVHTAVRLP 

44 941 CGGCGGGGGAGGAGGGCACCGAGGCCCTTCCCCTGACCACGGACCCCGGTCGGCTGGCCA 4 5000 
AGEEGTEALPLTTDPGRLAT 

4 5001 CGGCGCTGCGCGACCTGGCCGCCGCCCACGCGGGCCTGGCCGGGCCCCTCCCCGGCACCG 45060 
ALRDLAAAHAGLAGPLPGTD 

4 50 61 ACGAGCCGGTGAGCGCCCTCGGCCTGGACTCGCTCGCCTCCCTGCGGCTCCACCACCACG 4 512 0 
EPVSALGLDSLASLRLHHHV 

4 5121 TCCAGTCCGCCTACGGCGTGACCCTGCCCGTCACCGCCCTGCTCGGCGACACCACTTACC 4 5180 
QSAYGVTLPVTALLGDTTYR 

45181 GCCGGCTCGCGGAGCTGACGCTCGCCGCCCCCCGCCCGGCCCGGGCGCCCGAGGGGCAAG 4 5240 
RLAELTLAAPRPARAPEGQV 
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45241 TCACCGGCGTCTGGCGGCCGTTGACGCACGGGCAGCGCGCCCTGTGGTACGAACAGGCGC 4 53 00 
TGVWRPLTHGQRALWYEQAL 

453 01 TCGCCCCGCACGCGGCCGCCTACCACCTCGTCCGCGCGCTGGCCCTCCGCGGCCCCGTCG 45360 

APHAAAYHLVRALALRGPVD 

45361 ACGAGGAGGCCCTCGCCGAGGCGGTCCGCCGCGTCGTCCGCCGCCACCCCGCCCTGCGGA 4 5420 
EEALAEAVRRVVRRHPALRT 

45421 CCCGCTTCGCGCTCCGCGACGGCGAACCGGCGCGCCGGACCGAGCCGTACGGACCGGAGC 4 54 BO 
RFALRDGEPARRTEPYGPEL 

454 81 TGGACGTACGCGACGCCACCGGCCTGCCGGCGGACCGGCTCCGCGAACACCTGGCCGCGG 4 5540 

DVRDATGLPADRLREHLAAA 

4 5541 CGGGCGACCGCCCCTTCGACCTGGCCGCCGGCGACAGGCCCGTGAGGCTGACGCTCTACC 45600 
GDRPFDLAAGDRPVRLTLYR 

45 6 01 GCACGGACGGCGGCCACATCCTGCTGCTGGTCGCCCACCACCTGGTCGCCGACTTCTGGT 45 6 60 
TDGGHILLLVAHHLVADFWS 

45 661 CCCTCGTCGTCCTCCTGGGCGACCTCGCCCGGGCCCACGCGGGCGAGGACCTGCCGCCCG 45720 
LVVLLGDLARAHAGEDLPPA 

45721 CGCCGGAGGGGGACCCCGGCGACGAGGCGACGGACGCGGACCGGACGTACTGGCGGCACC 4 57 80 
PEGDPGDEATDADRTYWRHR 

45781 GGCTCGCCGACGCGCCACCCGCCCTCGACCTGCCCACCGACCTCCCCCACCCCGCCGAGC 4 5840 
LADAPPALDLPTDLPHPAER 

4 5841 GCGGCTTCGCCGGCGCCACCCACGCCTTCCGGCTGCCCCCGGACCTCACCGCCCGGCTGA 45900 
GFAGATHAFRLPPDLTARLT 

4 59 01 CCGCCCTCTCCCGGGAACGGCACTGCACCCTCTTCACCACCCTCCTCGCCGCCCACCAGC 4 5 960 
ALSRERHCTLFTTLLAAHQL 

4 59 61 TACTGCTCCACCGCCTGACCGGGCAGGACGACCTCGTCGTGGGCACCCTCCTCGCCCGCC 4 6 02 0 
LLHRLTGQDDLVVGTLLARR 

46021 GCGACACCGCCGAAGCGGCCGGCGCCGTCGGCTACCTGGTCAACCCGCTGCCGCTGCGCT 46080 
DTAEAAGAVGYLVNPLPLRS 

4 6081 CCGTACGGGAGCCGGGGGAGACCTTCACGGAACTGCTGCGCCGCACCCGGCGGACCGTGC 4614 0 
VREPGETFTELLRRTRRTVL 

4 6141 TGGACGCGGTCGCGCACGGCCGCCACCCCTTCGGGCCGCTCGTCTCCCGTCTCGCCCCCG 46200 
DAVAHGRHPFGPLVS RLAPA 

4 6201 CGCGCACGCCCGGCCGCGCGCCGCTCCTGCAGAGCCTGTTCGTGCTCCAGCGCGAGTACG 46260 
RTPGRAPLLQSLFVLQREYG 

4 62 61 GCGACGAGGCGGACGGGTACCGCGCGCTCGCCCTGGGCGTCGGCGGCCGGCTGCGCGTCG 4 632 0 
DEADGYRALALGVGGRLRVG 

4 6321 GCGGACTCGACCTGGAGGCACTCGCGTTGCCGCGCCGCTGGTCGCAGCTCGACCTCTCGC 46 3 8 0 
GLDLEALALPRRWSQLDLSL 

4 63 81 TGAGCATGGCGCGGCTCGGGGACGGGCTGACGGGGGTGTGGGAGTACCGCACCGACCTGT 4644 0 
SMARLGDGLTGVWEYRTDLF 

4 6441 TCACCGAGGCCACGGTCGCGGAGCTGAGCGAGGCGTTCGTCCACCTGCTGCGGGCGGCCG 4 6 50 0 
TEATVAE LSEAFVHLLRAAV 

4 65 01 TCGAGGACCCGGGCGCGCCCGTGGAGACGCTGCCGCTCACCGGCGGCCGGGAGACCGGGC 46 56 0 
EDPGAPVETLPLTGGRETGP 

46561 CGCGCCGCGGCCCGTCGGCGGCCCGGCCCGCCCTCCCGCTGCACCGGCTCGTGGCCGCGG 4662 0 
RRGPSAARPALPLHRLVAAA 

4 6621 CGGCGCGCCGCGATCCCGCACGGACGGCGGTCGTCGCACTCGCCCCGGACGGCACCGCCC 46 68 0 
ARRDPARTAVVALAPDGTAH 

4 6681 ACCACATCAGCCACGGAGCCCTGCACCGCGCGGCCACCACCCTCGCCGCCCGGCTCCGCC 4 6740 
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HI SHGALHRAATTLAARLRR 

46741 GGGAGGGCGCCGGCCCGGAGCGGCCCGTCGCCGTGCTCGTCGAGCGGGGCCCCTGGCTGC 46800 
EGAGPERPVAVLVERGPWLP 

46801 CCGTCGCCTACCTCGGCATCCTGCACGCCGGGGCCACCGTGCTGCCCCTGGACCCGGAGG 46860 
VAYLGI LHAGATVLPLDPED 

46 861 ACCCCCCGCACAGGCTCGCCCGGACGATCGCGAACTCGGGGGCGCGGCTGCTGCTCACCG 4 692 0 
PPHRLARTIANSGARLLLTE 

46 921 AGACCGGGACCGCCTCGCGCGCGGCCGAGGCGGCCGGTCCCGGCGTACGCGCGCTGACCG 4 698 0 
TGTASRAAEAAGPGVRALTV 

46 981 TGCGTGAGGGTGCCACCGGCGGCGAGCGGTTCTCGGCGGACGTCCACCCCGAGCAGTCCG 47040 

REGATGGERFSADVHPEQSA 

47 041 CGTACCTGCTGTACACCTCCGGGTCGACGGGCGACCCCAAGGGCGTGCTCGTCCCGCACC 4 710 0 

YLLYTSGSTGDPKGVLVPHR 

47101 GGGCCATCGTCAACCGCCTCCTGTGGATGCAGGAGACCTACCGGCTGCGCCCGGGGGAGC 47160 
AIVNRLLWMQETYRLRPGER 

47161 GGGTCCTGCACAAGACGCCGGTGACGTTCGACGTCTCGATGTGGGAGCTGCTGTGGCCGC 472 2 0 
VLHKTPVTFDVSMWELLWPL 

47221 TGACCGCCGGGGCGACCGTCGTCATGGCCCGGCCCGGGACCCACCGCGACCCCGCGCGAC 47280 
TAGATVVMARPGTHRDPARL 

47281 TCGTCCGGCGGATCGCCCGCGAGGCCGTCACCACCGTGCACTTCGTCCCCTCGATGCTCA 47340 
VRRIAREAVTTVHFVPSMLT 

47341 CCCCGTTCCTCACCGAGCTCGCCCGCGGCACGACGCGGCTGCCCGCGCTGCGGCGCGTGG 47400 
PFLTELARGTTRLPALRRVV 

47401 TGTGCAGCGGGGAAGAGCTGCCCGCGGCCGCGGTGAACCGCGCCGCCGGACTCCTCGACG 47460 
CSGEELPAAAVNRAAGLLDA 

4 7461 CCCGGCTGTACAACCTCTACGGCCCGACCGAAGCCGCCGTCGACGTCACCGCCTGGCCCT 47520 
RLYNLYGPTEAAVDVTAWPC 

4 7521 GCCGCCCGCCCGAGCCGGGGCCGGTGCCGATCGGCCTGCCCATCGCCAACACCACCACCG 47580 
RPPEPGPVPIGLPIANTTTE 

47581 AGGTCCTCGACGGCCGGCTGCGCCCGCTGCCCCGCCCGGTGCCCGGCGAGCTGTACCTGG 4 764 0 
VLDGRLRPLPRPVPGELYLG 

47641 GCGGCGCCTGCCTGGCCCATGGCTACCACCACGACCCGGCCCTGACCGCCGCGCGCTTCC 47700 
GACLAHGYHHDPALTAARFL 

477 01 TTCCGGCCCCCGGCGGCGGGCGCCGCTACCGCACCGGGGACCTCGTCCGCCAACGGGCCG 4 7760 
PAPGGGRRYRTGDLVRQRAD 

47761 ACGGGGCACTGGTGTTCCGGGGACGCACGGACGACCAGGTGAAGATCGGCGGCATCCGGG 4 7820 
GALVFRGRTDDQVKIGGIRV 

4 7 821 TCGAGCCCGGCGAGGTGGCGGAGGCGCTTCGGGCCCTGCCCGGCGTCGCCGACGCCGCGG 4 78 80 
EPGEVAEALRALPGVADAAV 

47881 TCGTCCCGCACGACGGGCGGCTGGCGGCGTACGCGGTCGCCGACCCGGTCGGCCCGGCCC 4 7940 
VPHDGRLAAYAVADPVGPAP 

47941 CGGCGGCGGACGCCCTGCGGGACGCGCTGCGCAGGCGGCTGCCCGGCCACCTGGTGCCCG 48000 
AADALRDALRRRLPGHLVPA 

48 001 CCGCCCTCACCCTGCTGGACCGGCTGCCCCTCACCCCGGCGGGCAAGCTCGACCGCCGGG 4 80 60 

ALTLLDRLPLTPAGKLDRRA 

4 8 061 CGCTGCCCCACCCGTCGGCCCCGCCCCCGGACGGCGGACGGCCGCCCACGACCGGGACCG 48120 
LPHPSAPPPDGGRPPTTGTE 

4 8121 AACGGCTCGTCGCCCGGGTGTGGGCCGAACGCCTCGGACGGGAAGTCGTCGGCGTGGACC 48180 
RLVARVWAERLGREVVGVDR 
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48181 GGGACTTCTTCTCCCTGGGCGGCGACTCCGTCCGGGCCCTCGGCGTGACGGCGGCCCTGC 48240 
DF F S LGGDSVRALGVTAALR 

48241 GCGCCGCCGGGCTCCCGGTGACGGTCACCGACCTCCTGCGCCTGCCCACCGTGGCCGCCC 4830 0 
AAGLPVTVTDLLRLPTVAAL 

48301 TCGCCCGCCACGCCGACGAGCGGGCGGATCGCCGACCGGCGCGACAGGAGACGCCCCCCG 4836 0 
ARHADERADRRPARQETPPG 

483 61 GGCCGTTCGCCCTCTGCCCGGAAGCCGCCGGCGTGCCCGGCCTGGAGGACGCCTACCCGA 4842 0 
PFALCPEAAGVPGLEDAYPM 

4842 1 TGTCGATGGCCCAGCGGGCCGTGCTCTTCCACCGTGACCACAACCCCGGCTACGAGGTCT 4848 0 
SMAQRAVLFHRDHNPGYEVY 

48481 ACGTCACCAGCGTCGCCGTCTCCACGCCCCTGGACCGCACACGGCTCGCCGCGGCCGTGG 48540 
VTSVAVSTPLDRTRLAAAVD 

48 541 ACCGGCTGCTGGACCGGCACGCCTATCTGCGGTCCTCCTTCGACCTCGTGTCCCACCCGG 48600 
RLLDRHAYLRSSFDLVSHPE 

48 601 AGCCCACCCAGCTCGTCTGGACCCACCTGCCCACCCCGCTCGAGGTGGTGGAGTCGTCCG 4 8 660 
PTQLVWTHLPTPLEVVESSD 

48661 ACCCCGCCGGTTTCGACGCGTGGCTGCACGCCGAACGCAAGCGCCCCCTCGACGTCGGCA 48720 
PAGFDAWLHAERKRPLDVGT 

48 721 CCGGACCGCTGGCCCGGTTCACCGCGCACGACGCGGGAGCCGCCGGATTCCGGCTGACCG 4 8 78 0 

GPLARFTAHDAGAAGFRLTV 

48781 TCAGCAGCTTCGCCCTCGACGGCTGGTGCGTGGCCACCGTGCTCACCGAACTGCTCCGCG 48840 
SSFALDGWCVATVLTELLRD 

4 8 841 ACTACTGGTCCGCGCTGCGCGGCGCGCCCCTCAGCCTCCCGGCACCCGCCGCCTCCTACC 48900 
YWSALRGAPLSLPAPAASYR 

4 8 9 01 GCGAGTTCGTCGCCCTCGAACGCGCCGCCCAACACGATCCGGCGCACCGGGAGTTCTGGC 4 8960 
EFVALERAAQHDPAHREFWR 

48961 GGACGGAGCTCGCCGGTGCCCGGCCGCATCCGCTGCCCCGCCGCCCGGTGCCACCGCCCG 4 9020 
TELAGARPHPLPRRPVPPPG 

49021 GGCCGGACGGGATCCGCCAGCACCGTCACGTCGTCCCCGTCGAGGACACCGTCGCCAAGG 4 9080 
PDGIRQHRHVVPVEDTVAKG 

4 9 081 GCCTGTCGGCGCTCGCCGGCGAGCTGGGTGTCGGGCTCAAACACGTTCTGCTCGGCGTCC 4 914 0 
LSALAGELGVGLKHVLLGVH 

49141 ACCTGCGGGTCGTCCGGGCCCTGTCCGGCGACCCCGACGTCATCACGGCCGTGGAGACCC 4 9200 
LRVVRALSGDPDVITAVETH 

49201 ACGGCCGCCTCGAACGGCACGACGGCGACCGCGTCCTCGGGGTGTTCAACAACATCCTGC 4 92 60 
GRLERHDGDRVLGVFNNI LP 

49261 CGCTGCGGCAGCGGGTGGACGGCGGGAGCTGGGCCGACCTGGCCCGCGCCGCGCACGCCG 49320 
LRQRVDGGSWADLARAAHAA 

49 321 CGGAGGCGCGGACGGGGGAGTACCGCCGCTATCCGCTGGCCCAGGCACAGCGCGACCACG 4 93 8 0 

EARTGEYRRYPLAQAQRDHG 

493 81 GCGCGGCCGGGCTCTTCGACACCCTCTTCGTGTTCACCCACTTCCACCTCTACCGCGCGC 4 9440 
AAGLFDTLFVFTHFHLYRAL 

4 9441 TGGCCGACCTGGACGGCATGGCGGTCTCCGACCTGCGGGCCCCCGACCAGACCTACGTAC 4 95 00 
ADLDGMAVSDLRAPDQTYVP 

4 9501 CGCTCACCGCCCACTTCAACGTCGACGCCACGGACGGCGGCGGCCTGCGGCTGCTGCTGG 49560 
LTAHFNVDATDGGGLRLLLE 

49561 AGTCGGACCCGCGGGAGTTCCCCGACGAGCAGGTCGCGGAGTTCGCCGCGTACTACCGCC 4 9620 
SDPREFPDEQVAEFAAYYRR 

4 9621 GCGCGCTGCGGGCCGCCGCCGACGCCCCGCACCGGCCGTACCGGGACACGCCGTTGACGG 49680 
ALRAAADAPHRPYRDTPLTD 
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4 9681 ACCGGCCGGCCGGTCCGGCGCCGCACCGCGCGGAGCGCTCCGTCCACGCCCTGTTCGCGG 4 9740 
RPAGPAPHRAERSVHALFAA 

49741 CCCCGGCCCGGAACCACCCGGACCGGATCGCGCTCGACGGCGAGGACGGGCCGGTCAGCC 498 00 
PARNHPDRIALDGEDGPVSH 

4 9801 ACGGCGCCCTGGCCCGGCGCGCCGCCCGCCTCGCCGGAACGCTGCGGGCCGCGGGCGCCG 4 98 60 

GALARRAARLAGT LRAAGAG 

49 8 61 GGCCGGACACCGTCGTCGGGATCTGGGCGCCGCGCCGCGCCGACGCCGTCGTGGCGCTGC 4 9920 
PDTVVGIWAPRRADAVVALL 

49921 TGGCCGCCCTCCACGCCGGAGCCGCCTACCTGCCCCTGGACCCGGTCCACCCGCCCCGGC 4 9980 
AALHAGAAYLPLD PVHPPRR 

49981 GGCAGCGGCAGGTGCTCACCGAGGCCGGCGCCCGCCTGCTCGTCCTGCCCGCCGGCCTCG 5 0040 
QRQVLTEAGARLLVLPAGLD 

5 0 041 ACACCCCGCTCCGGGCCTGCGGCCTGCCCGTCGTGGCCCCGGACGACCTCGGCGCGCCCA 50100 

TPLRACGLPVVAPDDLGAPI 

50101 TCGCCCCCGTGTCCGTCCACCCGGAGCAGCTGGCGGCGGTCATGGCCACGTCCGGCTCCA 5 016 0 
APVSVHPEQLAAVMATSGST 

50161 CCGGGACGCCCAAGACGATCGGCGTCCCGCAGCGCGCCCTGGCCGGCTACCTCCGCTGGG 50220 
GTPKT I GVPQRALAGYLRWA 

50221 CGATCGGCCACTACCGCCTCGACGAGGAGACCGTCTCCCCGGTGCACTCCTCGCTGGGCT 50280 
IGHYRLDEETVSPVHS SLGF 

50281 TCGACCTGACCGTCACCGCGCTGCTCGCACCGCTGGCCGCCGGCGGGCAGGCGCGGCTGA 5034 0 
DLTVTALLAPLAAGGQARLT 

5 0341 CCGACTCCGGCGACCCGGGTGCCCTCGGCGCGGCACTGGCCGCCGGCCACCACACCCTGC 5 0400 
DSGDPGALGAALAAGHHTLL 

5 0401 TCAAGATCACCCCGGCCCATCTGGCCGCCCTCGCCCACCAGTTGGGCGCGCCGACCGCAC 5 0460 
KITPAHLAALAHQLGAPTAL 

5 0461 TGCGCACCGTCGTGGCCGGGGGCGAACCCCTGCACGCCGGCCACGTCCGCGCCCTCCGCG 50520 
RTVVAGGE PLHAGHVRALRA 

5 0521 CCTTCGCGCCCGGCGCCCGGCTCGTCAACGAGTACGGGCCGACCGAGACCACCGTCGGCT 5 0580 
FAPGARLVNEYGPTETTVGC 

50581 GCTGTGCCCACGACGTCGCACCGGACCCCGGCGAGGCGCCCATCCCCGTCGGTACCCCGA 5 064 0 
CAHDVAPDPGEAPIPVGTPI 

50641 TCGCGGGCCTCAGCGCGTGCGTCGTCGACGACGCGCTGCCCGCACCGCCCGGCGTGCGGG 5 07 00 
AGLSACVVDDALPAPPGVRG 

5 0701 GCGAGCTGTACATCGGCGGGACGGGCGTCACCCGCGGCTACCTGGGCCGGCCCGCGGCCA 50760 
ELYIGGTGVTRGYLGRPAAT 

50761 CCGCCGCCGCCTACGTGCCGGACCCTGCCGCCCCCGGCGCCCGCCGCTACCGCACCGGCG 5 0820 
AAAYVPDPAAPGARRYRTGD 

50821 ACCTGGCACGCCGGCTGCCGGACGGCACCCTGCTCCTGGCGGGGCGCGCCGACCGCCAGG 508 80 
LARRLPDGTLLLAGRADRQV 

5 0 8 81 TGAAGATCCGCGGCCACCGGGTGGAACCGGGGGAGGTCGAGCAGGTGCTCGGCGGCGA.ee 5 0940 
KIRGHRVEPGEVEQVLGGHP 

5 0941 CCGGGGTGCGGGAGGCGGCGGTCGTCGCCCACCCGGCACCCGGCGGCGGCCGCCGGCTGG 510 0 0 
GVREAAVVAHPAPGGGRRLV 

510 01 TCGCGTACTGGGTACCGGCCGAACCGGCCCGGCCACCGTCCGCGGACGCGCTCACCGCGC 5106 0 
AYWVPAEPARPPSADALTAL 

51061 TGCTCGCCGACCGGCTGCCGCCGTACGCGGTCCCCGCCGAACTCGTCCGCCTGCCCGCCC 5112 0 
LADRLPPYAVPAELVRLPAL 

51121 TGCCCACCACCCCCAACGGCAAGGTCGACCACACCCGGCTGCCCGCGGCCGGACGGGACC 5118 0 
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K V D 



P A A G R D 



51241 

51301 
51361 
51421 
51481 

51541 



CCGGCCCGCCGGGGCCCCGCCGGTTCCGCTGGCCCCGGCGGAAGCCCGCCCGTCCCGCAC 
GTGCCGGTGCCCGGGCATGACGACCGCGTCGGACGGCTGCCGGCGGACCGGAGCGTCCCG 
CCGACCCGCCGATTCTCTGGGGACCCCGCCGGTTCCGGTGGTGGCCCGCCCGTCCCGCAC 



TCGGACAACCAGCGCGCCCTGCTGGACCGCTGGCTCGCCGAGGACCCCGCCGGCGGTGCC 
SDNQRALLDRWLAEDPAGGA 
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52681 GTCTCCGCGGCGACGGGCCGGGAGCTGACCGGCTTCTGCCGCCGCCACGGGCTGACCCCG 52740 
VSAATGRELTGFCRRHGLTP 

52 741 GCCGCCGTGCTGCACGGCGGCTGGGCGGTGCTGCTGTCGCTGCACTGCGGCCAGGACGAC 52800 
AAVLHGGWAVLLSLHCGQDD 

52801 GTGGTCTTCGGCACCACCCTCTCCGGCCGCCCCGAGGACCTGCCCGGCGTGACCGAGTGC 52860 
VVFGTTLSGRPEDLPGVTEC 

52 861 GTCGGCCTCTTCATCAACACGCTTCCCCTGCGGGTCCGTTGCGGGGAGGACACGGACGTC 52920 
VGLFINTLPLRVRCGEDTDV 

52 921 GTCGACTGGCTCCACGGCGTCCAAAGCGACCTGGCCGCCCTGTGGGACCACGCGCACGTc 52980 
VDWLHGVQSDLAALWDHAHV 

52 981 CCGCTCAGCCGCGTcGAGCGCGGTCTCGGACTGGGCCGGgGCGGCGGGCTGTTCGACAGC 5 3 04 0 

PLSRVERGLGLGRGGGLFDS 

53 041 ATCATGGTCGTCGAGAACTTCCCCGCCGCCGTCGCCGACGGCCACGAGGCGGgCGGGCTG 53100 

IMVVENFPAAVADGHEAGGL 

53101 CGGGTGACGGAGCCCCGGGCACTCGTCGACGAGGGCTACCCCCTCGTACTGGAGGCCACC 5 3160 
RVTEPRALVDEGYPLVLEAT 

53161 ACCGGGGACCGGCCGGTGCTGCACGCCCGCTACGACCCCCACCGCCTCGCCGGCGGGCGG 53220 
TGDRPVLHARYDPHRLAGGR 

53221 GTCCAGGCGCTGCTCGCCGCCTTCGACGACTACCTCCGGGCGGTGACCGCCGACCCGGCC 53280 
VQALLAAFDDYLRAVTADPA 

53281 CGCCCGCTGCCGGACCTCCGCGCGGTCCTGGCCCGCGACCACGCGCGCCGGGACGGCGCG 5334 0 
RPLPDLRAVLARDHARRDGA 

53 341 GCACGCGGGCGGCGCCGCGCCGCGGACCGCACCCGTCTGACGCTGGCCCGCCGCCGCCCG 53400 
ARGRRRAADRTRLTLARRRP 

534 01 GCGACGACGACCGAGGGAGAGACACCGTGACATGGACCGTGGTGACCGGAGCCGGCGGCT 534 60 
ATTTEGETP* 

MTWTVVTGAGGF (orf20) 

53 461 TCATCGGCTCCCACCTCGTACGCCGCCTCGTCCGGGACGGGCACCGGGTCCGCGGCGTGG 5352 0 
IGSHLVRRLVRDGHRVRGVD 

53 521 ACCTGGTGCCGCCGCGCTACGGCCCCGGCGAGGCCCAGGAGTTCGTCATCGCCGACCTGC 535 80 
LVPPRYGPGEAQEFVIADLR 

53 581 GCGACGCGGCGCAGGCCGCGCGGGCCGTCGCCGGCGCGGACTCCGTCTTCGCGCTCGCGG 53 640 
DAAQAARAVAGADSVFALAA 

53 64 1 CCAACATGGGAGGCATCGGCTGGACCCACACCGCGCCCGCCGAGATCCTCCACGACAACC 537 00 
NMGGIGWTHTAPAE I LHDNL 

53 701 TGCTGATCTCCACCCACACCATCGAGGCATGCCGGGCCGCCGGCGTGCGCACCACCGTCT 53760 
LISTHTIEACRAAGVRTTVY 

53 761 ACACCTCCTCGGCCTGCGTCTACCCCGCGTCCCTGCAGCGCGAGCCCGACGCCGCGCCGC 53 820 
TSSACVYPASLQREPDAAPL 

53 821 TGGCCGAGGACCCGGTCTTCCCCGCGGAACCCGACATGGAGTACGGCTGGGAGAAGCTGA 5 38 80 
AEDPVFPAEPDMEYGWEKLT 

53 881 CCACGGAAATCCTGTGCGGCGCCTACCGCCGCAGCCACGGCATGGACATCAAGACAGCCC 5394 0 
TEILCGAYRRSHGMDIKTAR 

53941 GGCTGCACGCCATCTACGGCCCCCTCGGCACGTACACCGGGCCCCGCGCGAAGTCCCTGT 54000 
LHAIYGPLGTYTGPRAKSLS 

540 01 CGATGCTCTGCGACAAGGTCGCCCGGATACCCGGCGACGAGGGGGAGATAGAGGTCTGGG 54060 
MLCDKVARIPGDEGEIEVWG 

54061 GGGACGGGACGCAGACCCGCTCCTACTGTTACGTCGACGACTGTGTCGAAGGGCTGATCC 54120 
DGTQTRSYCYVDDCVEGLIR 

54121 GGCTCGCCCGCTCCGACGTGGCGGAACCGGTCAACATCGGCTCCGAGGAGCGCGTCGACA 54180 
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LARSDVAEPVNIGSEERVDI 

54181 TCGCGTCGCTCGTCGAGCGGATCGCCGGGGTCGCCGGGAAGAAGGTGCGCTGCGCCTTCG 54 24 0 
ASLVERIAGVAGKKVRCAFA 

54241 CCCCCGACCGCCCGGTCGGGCCCCGCGGGCGCGTCTCGGACAACACCCGCTGCCGCGAAC 54300 
PDRPVGPRGRVSDNTRCREL 

54301 TGCTCGGCTGGGCACCGGAGACGTCCCTCGCGGCCGGCCTGGAGCGCACCTACCCGTGGA 54360 
LGWAPETSLAAGLERTYPWI 

54 361 TCGAGCGCCAGGTCCTCGCCGAGGCCGGGAGGGCCGATGCCTGAGCACCGCACACCGGTG 54420 

M (orfl9) 

ERQVLAEAGRADA* 

54421 AAGGACCTCGGCCGGCTGCTGCTCGGGCACGCCGCGCGCTTCCGGGGCCGCGAGCTGCAG 54480 
KDLGRLLLGHAARFRGRELQ 

5448 1 GACGTCGCCACCCGGGCGCTGCGGGCCTCCGGCGGGGAGAACGCCTGGGTGGTGTCCGTC 5454 0 
DVATRALRASGGENAWVVSV 

54541 GTCAACACCAGTCTCCGCGCCCGCCAGGCCGTGGACCACGCGCTGCGGCTCGCCCCCCGC 54 6 00 
VNTSLRARQAVDHALRLAPR 

54601 CGCGGGCTCTCCCGGCTGCGCTACCCGTTCTCCGCCGCCCACCACACGGCCACCCCGCCC 54660 
RGLSRLRYPFSAAHHTATPP 

54661 CGGACCCTGTCGCTGCTGTGCCCGACCCGCGAACGCGTCGGCAACGTCGAACGCTTCCTC 54720 
RTLSLLCPTRERVGNVERFL 

54 721 GACAGCGTCGCCCGCACCGCCGCCGCGCCCGGCCGGATAGAGGCCCTCTTCTACGTCGAC 54780 
DSVARTAAAPGRIEALFYVD 

54781 GACGACGACCCCCAACTCCCTGCCTACCACGAGCTGTTCGAGCACGCCCGGTGGCGCTAC 54840 
DDDPQLPAYHELFEHARWRY 

54 841 GGACGGATCGGCCGGTGCGCCCTGCACGTCGGCGCCCCCGTCGGCGTACCCCACGCCTGG 54900 
GRIGRCALHVGAPVGVPHAW 

54 901 AACCACCTGGCCCGGAACGCGGCCGGCGACGTGCTGATGATGGCCAACGACGACCAGCTC 54 960 
NHLARNAAGDVLMMANDDQL 

54 961 TACATCGACTACGGCTGGGACACCGCCCTCGACGCCCGCGTCACCGAACTGAGCGCCCTG 5502 0 

YIDYGWDTALDARVTELSAL 

55021 CACCCCGACGGCGTCCTGTGCCTGTACTTCGACGACGGCCAGTACCCCGAGGGCGGCTGC 55080 
HPDGVLCLYFDDGQYPEGGC 

55081 GACTTCCCGATGGTGACACGGCCCTGGTACGGCACCCTCGGCTACTTCACCCCGACGATC 5 514 0 
DFPMVTRPWYGTLGYFTPTI 

55141 TTCCAGCAGTGGGAGGTCGAGAAGTGGGTCTTCGACATCGCCGACCGGCTGCACCGGCTC 55200 
FQQWEVEKWVFDIADRLHRL 

55201 TACCCCGTCCCCGGCGTCCTCGTCGAACACCGGCACTACCAGGACTACAAGGCACCCTTC 55260 
YPVPGVLVEHRHYQDYKAPF 

55261 GACGCCACCTACCAGCGGCACCGGATGACACGGGAGAAGTCCTTCGCCGACCACGCCCTG 55320 
DATYQRHRMTREKS FADHAL 

55321 TTCCTGCGCACCGAGCCGGACCGCGAGGCGGAGACGGACAGGCTGCGGGCCGTCATCGCC 553 80 
FLRTEPDREAETDRLRAVIA 

55381 CGGGCAGGGAACACCCCGGACGCCGACCACGCCGACCATGCCGTTCACGACGCGGAGACC 55440 
RAGNTPDADHADHAVHDAET 

55441 TTCTGGTTCACCGGCCTCCTGCGCGAGTCCCACGCCAAGCTGCTCGCGGAACTCGACGAC 55500 
FWFTGLLRESHAKLLAELDD 

55501 GCGCCGGGCCCGGCCGCCGGAGCCGTGCTCTTCGCCGACGGCTCCTGGACCGGCGTCGCC 5 55 60 
APGPAAGAVLFADGSWTGVA 

55 561 TACCGCACCCACCCGCTGGCCACCGCCCTGCTCGCCTCGATCCCCGAGGCCACCCTCGAC 5 562 0 

YRTHPLATALLASI PEATLD 
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55621 
55681 
55741 



55860 



GCCCTCACCTTCGTGGTGCCGCGCCCGGCACCGGGGGAGTGAGGCCCGTGTGCGGCATCG 
ALTFVVPRPAPGE* 

MRPVCGIV 



55920 
(orfl8) 

55980 



CCCTCGGCCACACCCGGCTCGCCGTGATCGCCCCCGACGCCGGACGCCAGCCGGTCGCCG 
LGHTRLAVIAPDAGRQPVAG 



TGCTCTCCTGCGGCGCCCCCGCCCGCTGGGACACCGCCGCCTTCGCCGCGCACCTGCAGC 
LSCGAPARWDTAAFAAHLQL 

TCGGCCTGCCCCCCGACCGCACCCTCTTCGCCGGCATCCGGCAGCTCCCGCCCGGCTGCC 
GLPPDRTLFAGIRQLPPGCH 



56040 



ACCTGCGGGACGTCGTCCGCGCCGGCGAGATGGTGCAGGAGAACTCGCACGGCATCGCCC 
LRDVVRAGEMVQENSHGIAR 



26 



ACCGCCACCTGGCCGTCACCCAGCCCGTCGCCGCCCTGCTCCGCCCCGACTTCGCCGCCG 
RHLAVTQPVAALLRPDFAAE 



TGCTCGCCGCCGAACGCCTCGACGCGGCCCAGGCCGTCGAGGTGCGGCTGCCCCTCTTCG 
LAAERLDAAQAVEVRLPLFD 



GCCGCAAACAGGGCTTCCTCGCACCTCCGATGGCCGACGACGACACCCTCCTCGACGCCC 
RKQGFLAPPMADDDTLLDAL 



57841 
57901 



TCCGCGCCCTGCTGGACCGGCTGGCCGCCGCACCCCCGGGGCAGCGGTCCGGCGGCGAGA 
RALLDRLAAAPPGQRSGGEK 

AACTCCTCCAACTCGTCGCGAGCACCGCCGAACTGGCCGACGAGTTCGGCCTCACCACCG 
LLQLVASTAELADEFGLTTA 

CCCCCAGCGGGCAGAAAGGCGGCAACGGTGGCTGACCTCGATCCCGGCACGCTCTCCGAG 
PSGQKGGNGG* 

GCCGAGCTGACCGCCCGGATCGCCGCCCTGTCCCCCGAACGCCGGGCGGCGTTCGAGAAG 

ATGCTGCACGGCGCCGCGCACCCCCGCCCCGGCATCCCGCGCCGCGGCGCCACCGCGGCA 
MLHGAAHPRPGI PRRGATAA 

CCGGCCTCCTACGGCCAGGAACGCCTGTGGCTGCTCACCGGGCTGCTGCCCACCGCCTAC 
PASYGQERLWLLTGLLPTAY 



GACCTCATCCAGGTCGTCCACCCCACGGCGGACGTCCCCGTGCGCCTGGCCGACCTCACC 
DLIQVVHPTADVPVRLADLT 



CTGCTGGCCGTCCACCACGCCGTCACCGACGGCTGGTCCAACGGCGTCCTCGTGACCGAA 
LLAVHHAVTDGWSNGVLVTE 

CTCGCCACCGGCTACCGGGAACTGCGCGCCGGACGCCCCGACCGGCGGCCCGCCCCGCCG 
LATGYRELRAGRPDRRPAPP 

GTCCAGTACGGCGACTACGCGCACTGGCAGCGCGAGCGGCTGACCGGGCCCGAACTGCGG 
VQYGDYAHWQRERLTGPELR 
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5 8 561 GACCGCCCCCGCCCCGCCGCCCGGCGCGGCGAGGGCGCCAACCACGCCCTGCTGCTCTCG 58 620 
DRPRPAARRGEGANHALLLS 

58 621 CCGGAGCTGACCGGCCGGCTCGCCGACCTGCGCCGCCGCGAGGGCGGGTCGCTGTTCATG 58680 
PELTGRLADLRRREGGSLFM 

58681 CTCGTGCTCTCCGCGCTCCTGGTCGTCCTGCGTGGCACCGGCGGCCGGGACCGGCTCGCC 5874 0 
LVLSALLVVLRGTGGRDRLA 

58741 GTCGGCACCCTCGTCGCCGGCCGCACCCGCCCCGAACTCGAGCCGCTCATCGGCTACTTC 58800 
VGTLVAGRTRPELEPL.IGYF 

58 801 GTCAACGTCCTGCTGCTGCCCTTCGAGACCGGCGGCCGGACCTCCTTCGCCGAGCTGTGG 5 8860 
VNVLLLPFETGGRTSFAELW 

58 861 CGGCGGGTCCGCGGCCGGCTGGTGGAGGCGTACGCCCACCAGGAACTGCCGCTGGAGAAG 58 92 0 
RRVRGRLVEAYAHQELPLEK 

5 8921 GCCCTGGAGCTGCTGCGCGCCGACGGCACCGCCCCCGCCGACCCGCCGGTCGGCGTGGTC 58980 
ALELLRADGTAPADPPVGVV 

58 981 TGCGTCGCCCAGCAGCCCGCCCCCGCGATCACCCTGCCCGGACTCGACGCGAGCGTCGAG 5 9040 
CVAQQPAPAITLPGLDASVE 

59041 GACGTCGACCTGGGCACCGCCCAGTTCGACCTCGTCGTCGAGGTGCGCGAACGGCCGGAA 5 9100 
DVDLGTAQFDLVVEVRERPE 

59101 GGCGTGCAGATCGCCTTCCAGTACGACCGGGACCTGTTCGACGCGGCCACGGTCCGGCTC 59160 
GVQ IAFQYDRDLFDAATVRL 

59161 CTCGCCGACCACGTGCACGCCGTCCTCGACCAGGCCGCCGCCGACCCCACCCTGCCCTGT 59220 
LADHVHAVLDQAAADPTLPC 

59221 GCCGAGCTGCCCGCCCCGCCGGCCCCCGCGGCCCCGGCCCGCACGGCCGGCGCCACGACG 59280 
AELPAPPAPAAPARTAGATT 

59281 CTGCACGCCCTGTTCGAGTCCCGCGCCGCGAAGAGCCCCGACGCGGTCGCCCTCGTCGAC 59340 
IiHALFESRAAKSPDAVALVD 

5 9341 GGCGGCCACCGCGTCACCTACCGGACCCTCAACACCCGCGCCAACCGGCTCGCCCGCCAC 5 9400 
GGHRVTYRTLNTRANRLARH 

5 9401 CTGCGCGCGGTCGGCGTGCGTACCGAGGACCGGGTGGCGCTGCGCCTGCCCCGCGGCACC 59460 
LRAVGVRTEDRVALRLPRGT 

5 94 61 GACGCGGTGACCGCCACCCTCGCCGCCCTCAAGGCCGGCGCCGCGTACGTACCCCTCGAC 5 9520 
DAVTATLAALKAGAAYVPLD 

59521 CCCGCCCTCCCCGAGGAACGGCTGACCCGCGTCCTCGCCGACGCCCGCCCCGCCGTGGTC 5958 0 
PALPEERLTRVLADARPAVV 

59581 CTCACCCCCGCGTATCTGCACGACCGGTCCGCCGAGATCACCGCCCACGCCGGCCATGAC 5964 0 
LTPAYLHDRSAEITAHAGHD 

59641 CTCAACCTCCCCGTCCACCCCGACAACCTCGCCTACCTCCTCCACACCTCCGGATCCACC 59700 
LNLPVHPDNLAYLLHTSGST 

597 01 GGCACCCCCaAGGGCGTCCTCGGCAcCCACCGGGGCGCGGTCAACCGCGTCGACTGGATG 59 760 
GTPKGVLGTHRGAVNRVDWM 

5 97 61 AGCACCGCGTACCCGTTCCGGACCGGCGACGTGGCCGTCGCCCGCACCGCGCCCGGCTTC 5982 0 
STAYPFRTGDVAVARTAPGF 

5 9821 GTCGACGCGGTCTGGGAACTCTTCGGCCCCCTGGCCGCCGGCGTCCCCCTCGTCCTCCTG 59 88 0 
VDAVWELFGPLAAGVPLVLL 

5 98 81 CCGACCGACGAGGCGCGCGACCCGGCCCTGCTGACGGCGGCGCTGGAACGGCACCGGGTG 59 940 
PTDEARDPALLTAALERHRV 

59941 AGCCGGATGGTGACGGTCCCGTCGCTGCTGACCATGCTCCTGGACGAGTCCGCCCGCGCG 60 00 0 
SRMVTVPSLLTMLLDESARA 

60001 ACGGACCTCGGCACCCGCCTGGCCTGCCTCCGCACCTGGATCACCAGCGGCGAGCCCCTG 60060 
TDLGTRLACLRTWITSGEPL 
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60061 CCGCCCGCGCTCGCCCGGCGGTTCCACGACCGCCTGCCCGGCCGCACCCTGCTGAACCTG 60120 
PPALARRFHDRLPGRTLLNL 

6 0121 TACGGCTCCTCCGAGACCGCCGCCGACGCCACCGCGGCCCGCATCGACCCGGCGCCCGGG 60180 
YGS S ETAADATAARI DPAPG 

60181 ACTGCGCTCCCGGAGCGGTCCCCGATCGGCACGCCCATCACCGGCGTCAGCGCCCTCGTC 6 024 0 
TALPERSPIGTPITGVSALV 

60241 CGCGGCCCGGACCTGCGCCCGCTGCCCGCGCTGATGCCCGGCGAGCTGTACGCCGGGGGC 603 00 
RGPDLRPLPALMPGELYAGG 

60301 GCGTGCGTGGCCCGCGGCTACCACGCCCGTCCGGCCGAGACCGCCGCGGCGTTCCCGCCG 6 0360 
ACVARGYHARPAETAAAFPP 

6 0 361 GATCCCGACGGCGGGCCCGGCGCCCGGATGTTCCGTACCGGTGACAGGGCCCGGCTGCGG 6 0420 
DPDGGPGARMFRTGDRARLR 

6 0421 GCCGACGGCCGGCTGGAACTCCTGGGGCGCGTGGACCGGCAGGTGCAGATCCGCGGCCAG 6 04 80 
ADGRLELLGRVDRQVQIRGQ 

604 81 CGCGCCGAGCCCGGCGAGGTCGAACACGCCCTGCTGGCCCACCCGGCCGTACGGGCCGCC 6054 0 
RAEPGEVEHALLAHPAVRAA 

60541 GCCGTCACGGCGAACCCCGACGCCACCGGCCTGTGGGCGTACGTGCGGCTCGCTCCCGGC 60600 
AVTANPDATGLWAYVRLAPG 

60 601 CCGTTCGCCGCCGGCTCCCCCCAGACCGAGCTGACCGCCTTCCTGCGCCGCACGCTCCCT 60660 
PFAAGSPQTELTAFLRRTLP 

6 0661 GCCCACCTCGTGCCCACCGCCGTCACCGTCCTGGACGAGCTGCCGGTGACCGCGCACGGC 60720 
AHLVPTAVTVLDELPVTAHG 

6 0721 AAGACCGACCACGCGCGGCTGCCCGCCCCCGACCCCCGGGCCGGGCGCCCCGCCCCGACC 60780 
KTDHARLPAPDPRAGRPAPT 

6 0781 GCCCCCCGCACCCCCACCGAGCGTACGGTCGCCGACGTCTTCGCCGGGGTGCTCGGCCTG 60840 
APRTPTERTVADVFAGVLGL 

6 0841 GAGGGGCCGGTCGGCGCGCACGACGACTTCTTCCTCCTCGGCGGGCACTCCCTCCTCGCC 60900 
EGPVGAHDDFFLLGGHSLLA 

6 0 901 GCCCGCAGTCGCGGCGGAACTCCGCGCCCGCCGCGGCGTCCGGATCGGGCTGAGCGACGT 60 96 0 
ARSRGGTPRPPRRPDRAERR 

60961 CTTCGCGGCCCCCACCGTCGCCGCAGCGTCGCCGCCCGGACCGACGCCGCCCGGCCCGGC 6102 0 
LRGPHRRRSVAARTDAARPG 

61021 ACCGGCCCCGAGCACACCCCGTTCGTCACCGACCCCGGCGCCCGGCACGAGCCGTTCCCG 61080 
TGPEHTPFVTDPGARHEPFP 

610 81 CTCACCGACGTCCAGCGGGCCTACTACGTGGGACGCGAGGGCGGGTTCGCCCTCGGCGGC 6114 0 
LTDVQRAYYVGREGGFALGG 

61141 GTCTCCACCCACGCCTACCTGGAGATCGAGGCCCCGCGGATCGACGTCGCACGGTTTACC 6120 0 
VSTHAYLEIEAPRIDVARFT 

612 01 GGCGCGCTGCGCGGGGTGATCGCCCGGCACCCCATGCTGCGCGCCGTGATCCGTCCCGAC 6126 0 
GALRGVIARHPMLRAVIRPD 

612 61 GGGCTCCAGCAGGTGCTCACCGACGTCCCCCCGTACGACGTGGCCGTGCACGACCTGCGC 61320 

GLQQVLTDVPPYDVAVHDLR 

61321 GACCTGGACGAGCCCGCGCGGCAGCGCCGACGCGCCGCGCTGCGCGAGGAGATGTCCCAC 613 80 
DLDEPARQRRRAALREEMSH 

613 81 CAGGTGGTGCCCGCCGACCTCTGGCCCCTGTTCGACGTCCGCGTCTCCCTCGGCCCCACG 6144 0 

QVVPADLWPLFDVRVS LGPT 

6144 1 GACGCCCTCGTCCACGTGGGGGTGGACGCGCTGATCTGCGACGCCCACAGCTTCGGCCTC 6150 0 
DALVHVGVDALICDAHSFGL 

61501 GTCCTGGCCGAACTCGCGGCCCGTTACGCCGACCCCGCACGCCGCTTCCCGCCCCTGACG 61560 
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VLAELAARYADPARRFPPLT 

61561 GCGGACTTCCGGGACCACGTCCTCCATCAGGAGGCGCTCCGCGGAACCGCCGAGTACGCG 61620 
ADFRDHVLHQEALRGTAEYA 

61621 GCGGCGGAGCGGTACTGGCGCGAACGCCTGCCCGAGCTGCCGCCCGGCCCCGAACTGCCC 61680 
AAERYWRERLPELPPGPELP 

61681 CTGGCCGTCGCGCCCGAGACCCTCGGCACCCCGCGCTTCACCCGCCGCTCCGGCCGGCTG 61740 
LAVAPETLGTPRFTRRSGRL 

61741 GACGCGGCCTCCTGGACGGCGGTCAAGGACCGGGCCCGCCGCGCCGGGCTCAGCCCCTCC 61800 
DAASWTAVKDRARRAGLS PS 

618 01 GGCGTACTGCTGGCGGCGTTCGCCGAGGTGATCACCGCGTGGAGCGGCCGGCCGCGCTAC 61860 
GVLLAAFAEVITAWSGRPRY 

61861 TCGCTGATGCTGACGGTCTTCGACCGCCCGCCGCTCCACCCGGACCTCGGGCGGATCGTC 61920 
SLMLTVFDRPPLHPDLGRIV 

61921 GGCGACTTCACCTCGCTCAGCCTGCTGGAGGTCGACCACAGTCGGCCCGGCGACTTCACC 61980 
GDFTSLSLLEVDHSRPGDFT 

61981 GACAGGGCCCGCGCCCTCCAGCGCCGCCTGTGGCAGGACCTCGACCACCTGGCGGTCGGC 62 04 0 
DRARALQRRLWQDLDHLAVG 

62 041 GGCGTGACGGTGACACGGGAACGGGCGCTGCGCCACGACGCCCGACCCGGTCTGCTCACA 62100 
GVTVTRERALRHDARPGLLT 

62101 CCCGTCGTCTTCACCTCCGACCTGCCTGTCGGCGAGACCGCGGCCGAGGACGCGGACGGG 62160 
PVVFTSDLPVGETAAEDADG 

62161 GGAGAGGGATGGGCGCTCGGAGAGCCCGTCTACGGCGTCAGCCAGACCCCGCAGGTCCAT 6222 0 
GEGWALGE PVYGVSQTPQVH 

62221 CTCGACCATCAAGTCGCCGAAGACCGAGGGGAGTTGGTCTTCAACTGGGACGCCGTGGAA 622 80 
LDHQVAEDRGELVFNWDAVE 

622 81 GACCTGTTCGCCCCGGGCGCCCTGGACGCCATGTTCGCCGCCTACACCGCCTCGCTGACC 62340 
DLFAPGALDAMFAAYTASLT 

62341 CGCCTGGCCCGGAGCCCCGAAGCCTGGCGGCGGCCCGGCACGCCGCCGCTGCCCACCGCC 62400 
RLARS PEAWRRPGTPPLPTA 

624 01 CAGGCGGCCGTGCGCCGGCGCACCGCCGCGACCGAGGCGCCCCTGCCCGCCCGCCTGCTG 6246 0 
QAAVRRRTAATEAPLPARLL 

624 61 CACGAGGCCGTCGGCGACGCGGCCCGGCGCCACGCCGACCTGACCGCCCTGGTCGACGGC 62 52 0 

HEAVGDAARRHADLTALVDG 

62521 GACACCCGGATGACCTACCGGCGACTGACCGAGCACGCCCGGCGCGTCGGCCGCACGCTG 62 580 
DTRMTYRRLTEHARRVGRTL 

625 81 CGCCGCCTCGGCGCCCGCCCCGGCCGCCTGGTCCCGGTGGTCGCCCGCAAGGGGTGGCGG 62 64 0 

RRLGARPGRLVPVVARKGWR 

62 641 CAGGCCGTCGCCGCGCTGGGCGTCCTGGAGTCGGGGGCGGCGTACCTGCCCCTGGACCCC 62 70 0 
QAVAALGVLESGAAYLPLDP 

62 7 01 GAACTGCCCGCCGAACGGCTCGTCCACCTCGTACGGCGCGCCGAAGCCGCCCTCCTCCTC 62 760 
ELPAERLVHLVRRAEAALLL 

627 61 ACCGAACGCGCCCTGCTGGACACGCTCGCCGTCCCCGTCGGCGTCACCGTGCTCGCGGTG 62 82 0 
TERALLDTLAVPVGVTVLAV 

62821 GACGACGACGCGGCCCTCGACGCCGACGGCGGCCCGCTGCAGAGCGTGCAGAACCTCACC 62880 
DDDAALDADGGPLQSVQNLT 

62 881 GACCTGGCGTACACCATCTTCACCTCGGGCTCCACCGGCGAACCCAAGGGCGTCATGATC 62940 
DLAYTI FTSGSTGEPKGVMI 

62 941 GACCACCTCGGCGCGGCCAACACCCTGGAATGCGTCAACCGCCGCTTCGGCACCGGCCCC 63 00 0 
DHLGAANTLECVNRRFGTGP 
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63 001 GGCGACGCGGTCCTCGCCGTCTCCTCCCCGAGCTTCGACCTCGCCGTCTACGACCTGTTC 63060 
GDAVLAVS S PS FDLAVYDLF 

63 061 GGCGTGCTGGCCGCCGGCGGCACCGTGGTCGTCCCCGCCCACGACCGCCGGCGCGACCCC 63120 
GVLAAGGTVVVPAHDRRRDP 

63121 GGACACTGGGCCGAGCTGATCCGGCGCGAGCGGGTCACCCTGTGGAACTCCGTCCCCGCG 63180 
GHWAELIRRERVTLWNSVPA 

63181 CTGGGCACCCTGCTCACCGAGTACGCCGAGGCCCTCGCCCCCGACGCCCTGCGCACCCTG 6324 0 
LGTLLTEYAEALAPDALRTL 

63241 CGGGCGGTGCTCCTCAGCGGCGACTGGATCCccctcggactgcccgaccGGATCCGCGCC 633 0 0 
RAVLLSGDWI PLGLPDR IRA 

63 301 CTGTCCGCCCCCGGCGCCACCGTGATGAGCCTCGGCGGCGCGACCGAAGCCTCCATCTGG 633 60 
LSAPGATVMSLGGATEASIW 

63 361 TCGGTCTGGTACGAGATCGGGAAGGTGCACGAGGCGTGGAGCAGCATCCCCTACGGCACC 6342 0 
SVWYE IGKVHEAWSS I PYGT 

63 421 CCCATGGCCAACCAGCGGCTGGAGGTCCTCGACGAGCAGCTGCGGCCCCGGCCCGACTGG 634 80 
PMANQRLEVLDEQLRPRPDW 

63 481 GTGCC CGGCGAGCTGTACATCGGCGGCACCGGCGTCGCCAAGGGCTACTGGCGCGACCCG 63 540 
VPGELYIGGTGVAKGYWRDP 

63 541 GAACAGACCTCCCTGCGCTTCCCCGTCCACCCGGGCAGCGGGCAACGCCTGTACCGCACC 6 36 00 
EQTSLRFPVHPGSGQRLYRT 

636 01 GGGGACTTCGCCCGCCACCTCCCCGACGGCACGCTGGAATTCCTGGGCCGGCAGGACGAC 63660 
GDFARHLPDGTLEFLGRQDD 

63661 CAGGTGAAGATCGGCGGATTCCGGGTCGAACTGGGCGAGGTCGAGGCGGCCCTCGGCCGA 63720 
QVKIGGFRVELGEVEAALGR 

63721 CTGCCCGACGTCGCCGCCGGCGCGGTGATCGCCACCGGTGACCCGCGGGGCGACCGCCGC 63 78 0 
LPDVAAGAVIATGDPRGDRR 

63781 CTCGTCGGCTTCGCCGTACCGGCCCGGGAGGGCGGCTTCGACGCGGCCGGGCTCCGACGG 63840 
LVGFAVPAREGGFDAAGLRR 

63841 CAACTCGCCCGGCGGCTGCCCGCCTACATGGTCCCCACGACCCTGCTGCCCCTGGACCGG 63 90 0 
QLARRLPAYMVPTTLLPLDR 

63901 CTGCCGCTGACCGCCAACGGCAAGGTCGACCGGGCCGCACTCCAACGCCTCGTCCCCGGC 63 96 0 
LPLTANGKVDRAALQRLVPG 

63 961 CGCGCACCGGCCCCGGCGGAACCCGCCACCGCCCCACCTGCCCGTTCCCGCGCCGTCCCC 64 02 0 

RAPAPAEPATAPPARSRAVP 

64 021 GTGCCCGGCTGGCTCGCCGACCTGTGGTGCGAACTCCTCGACGTGCCGGAGGCCGACCCC 64080 

VPGWLADLWCELLDVPEADP 

640 81 GACGCGAACTTCTTCGCCCTCGGCGGCACCTCCCGGGTCGCGATCACCCTGGTCACCCGG 6414 0 
DANFFALGGTSRVAITLVTR 

64141 ATCGAGGCCCGACTCGCCGTCCGGGTGCCCCTCGCCCGCCTCTTCGACGCCCGCACCCTG 64200 
IEARLAVRVPLARLFDARTL 

642 01 GGCGGCCTCGCCGAGACGATCGCCGAACTGTCGGCCGCCGCCGAGGAGGAGCCGGCACCC 64260 
GGLAETIAELSAAAEEEPAP 

642 61 GCCGAGCCCGTGTACGCCCCCGACCCCGCCACCCGCCACGAGCCGTTCCCGCTCACCGAC 64 32 0 
AEPVYAPDPATRHEPFPLTD 

64321 ATCCAGCGCGCCTACTGGCTCGGCCGGCACCGCTCCCTCTCCCTTGGCGGCGTCGCCACG 6438 0 
IQRAYWLGRHRSLSLGGVAT 

64 3 81 CACACCTACCTCGAACTCGACGTCGAGGACCTCGACCCCGGCCGGCTCCAGACGGCCCTC 6444 0 
HTYLELDVEDLDPGRLQTAL 

64441 CGCCGGCTGATCGACCGCCACGACGCCCTCCGGCTCGTGGTCCTCCCCGACGGCCGGCAA 64500 
RRLIDRHDALRLVVLPDGRQ 



31 




645 01 CAGATCCTCGGCGACGTACCGCCGTACCTCCTCGCCCACACCGACCTGCGGGGCAGGGCG 6456 0 
QILGDVPPYLLAHTDLRGRA 

64561 GACGCCGAGGCCGAACTGGCCCGCGTCCGCGAGCACATGTCGCACGAGGTGCGCGACGCC 64620 
DAEAELARVREHMSHEVRDA 

64621 TCCCGCTGGCCGCTGTTCGACGTACGGACCCACCGCCTGGACGACGTCCGCACCCGGCTG 6468 0 
SRWPLFDVRTHRLDDVRTRL 

64681 CACCTGAGCTTGGACCTGCTCATCGCCGACGCCCACAGCGTCCACGTACTCACCGGCGAC 64740 
HLSLDLLIADAHSVHVLTGD 

64741 CTGCTCACCTTCTACGCCGACCCCGACGCGGCCCTGCCGCCCCTCGGCTGCTCCTTCCGC 64800 
LLTFYADPDAALPPLGCSFR 

64 801 GACTACGTCCTGGCCGTCCGCGCCCACGCCGAGGGCGAGCCGCGCCGCCGCGCCCTCGAC 64860 
DYVLAVRAHAEGEPRRRALD 

64 861 CACTGGCGGGCCCGGCTGGCCGACCTGCCGGGCCCGCCCGGCCTGCCGCTGCGGTGCCGG 64 92 0 
HWRARLADLPGPPGLPLRCR 

64 921 CCCGAGGAGCTGACCGCGCCGCGGTTCGCCCGCCTCACCACCGGACTCGGCCCCGACGCC 64980 

PEELTAPRFARLTTGLGPDA 

64981 TGGGCACGGCTGCGGCGCGCCGCGGCGGCCGCCGAACTCACCCCGGCCGCACTGATCTGC 65040 
WARLRRAAAAAELTPAALIC 

65041 GCCGCCTTCTGCGACGTCCTCGCCCAGTGGAGCGACACCCCCCGCTTCACCCTCAACCTC 65100 
AAFCDVLAQWSDTPRFTLNL 

65101 ACCACCTTCCACCGCCCCGCCCTGCTCCCCGGCGTGGACGACCTCGTCGGCGACTTCACC 6516 0 
TTFHRPALLPGVDDLVGDFT 

65161 ACCACGACCCTGCTCGGGGTCGACGGCGAGGGGGACACCTTCCGGGACCGGGCCCGCCGA 6522 0 
TTTLLGVDGEGDTFRDRARR 

65221 CTCCAGGACCGCATCTGGGAGGACCTCGAACACCGCGTCGTCAGCGGCGTCGAGGTCCTG 6528 0 
LQDRIWEDLEHRVVSGVEVL 

65281 CGGATGCTGCGCCGCGAGCGGGGCACCCACGACGCCGTCCGGATGCCGGTCGTCTTCACC 65340 
RMLRRERGTHDAVRMPVVFT 

65341 AGCACCCTGCGGGCCGCCGGCCCCGCCCCCCGGACGGCCCCGCCCGCCTGGCGGGTACGG 65400 
STLRAAGPAPRTAPPAWRVR 

654 01 CCCGGCTACGCGATCAGCCAGACCCCGCAGGTCCTGCTCGACCATCAGGTGAGCGAGAGC 654 60 
PGYAISQTPQVLLDHQVSES 

65461 GACGGCCGACTGGTCTGCACCTGGGACTACGTCGCGGACGCCTACCCGCCCGGGCTGATC 6552 0 
DGRLVCTWDYVADAYPPGLI 

65 521 GAGGCCATGTTCGGGGCCTTCGAGGCGCTCCTCGCCTCGCTCGCCGGTCACGACGACGAC 65580 

EAMFGAFEALLASLAGHDDD 

65581 GCCGGCCACGACGACGACGCCGGCCACGACGACGGCCCCGGCCACGACGACGGCCCCGGC 6564 0 
AGHDDDAGHDDGPGHDDGPG 

65641 CACGACGACGGCCCCGGCCACGACGACGGCCCCGGCCACGACGACGGCCCCGGCCGCGAC 65700 
HDDGPGHDDGPGHDDGPGRD 

657 01 GACAGTGCCGATCACGGCCACAGTGCCACGCACGACGACAGCGCCGCCCGAAACGACAGA 65760 
DSADHGHSATHDDSAARNDR 

65761 GAGGGAGGTGGACCGGAGTGACGAGCGCCCGGCCCACGCCGACACTGCTCCCCGCCGACC 65820 
E G G G P E * 

MTSARPTPTLLPADQ (orfl6) 

6 S 8 2 1 AGCGGGAGCTGCTGCGGATGATGAACGACCGCACCGCACCCGTGCCCGCGCACACCCTCA 658 80 
RELLRMMNDRTAPVPAHTLT 

65881 CCGCCCAACTGGCCGACGCCGCGCGCACGCACGACCGGGCTCTGGCACTGGTGGCACCGG 65940 
AQLADAARTHDRALALVAPG 
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65941 GTCTGACACTGAGCCACGCCGAACTGGACGCCCGGGCGGCCGCGGTGGCCGCCCGGCTCA 66000 
LTLSHAELDARAAAVAARLT 

66 001 CCGCCGCGGGCGTCATCCCCGGGGACCGGGTCGCCCTCGCCGTCGAGTACGGCTGGGAGC 66060 

AAGVIPGDRVALAVEYGWEQ 

66061 AGGTCGTGGGCGCCCTGGCCGCGCTCCGCGCCGGAGCCGTCTGCCTGCCCGTCGCCCCCG 66120 
VVGALAALRAGAVCLPVAPG 

66121 GGCTGCCCCGGCCCGCCCGCTGGCAGCACGCCACCCGGGCCGGGGCGACGGCCGTCCTCA 6618 0 
LPRPARWQHATRAGATAVLT 

66181 CCCAGTCCTGGCTCACCCAGCGCATCGACTGGCCGCAGGAACTGCCCGTCCTCTCCGTGG 6624 0 
QSWLTQRIDWPQELPVLSVD 

66241 ACGAACCCGGGCCGCCGGTACCACCCACCACCGCCCCGGCCGACGGACGGTCCGCGACCG 663 0 0 
EPGPPVPPTTAPADGRSATD 

663 01 ACGCCGCCTACCGGCTGGACGCCCCCGTCAGCCACCGCGCGATCACCACCGCCGCCCTGG 66360 
AAYRLDAPVSHRAITTAALE 

663 61 AGATCGACCGCGCCTTCCGCGTCGGACCCGGCGACCGGCTCCTGGCCCTGGCCCCCGCCG 66420 
IDRAFRVGPGDRLLALAPAD 

6 6421 ACTCGCCGCTCGCTCTCTACGAACTGTTCGGGCCCCTCCTGGCCGGTGCGGCCCTCGTCC 66480 
SPLALYELFGPLLAGAALVL 

6 6481 TCACCCGGGACATCGACCTGCGCGATCCCGGAGCCCTGCACGAGGCGCTGCGCACCCACG 66540 
TRDIDLRDPGALHEALRTHG 

6 6541 GCGTCACCCTCTGGCACTCGCCGCCCGCCCTCCTCGGCCTCCTCCTCGACCACCTCGCCG 66600 
VTLWHSPPALLGLLLDHLAD 

66601 ACCGGGGCGGCAAACTGCCCGAGTCGCTCCGGCTGGTGCTGCTCGGCGGCGAACGCCTCG 66660 
RGGKLPESLRLVLLGGERLD 

6 6661 ACCCCGCCCTCGTCCGCCGCGTCCGCGAGAGCGCCCCGCACCAGCCGGCCGTCGCCCACC 6672 0 
PALVRRVRE SAPHQ PAVAHL 

66721 TCTCCTCGGCCACCCCGTCCGGCCCCTGGACCACCTGCCTGGAGACCGGCGACCTCGCCC 6678 0 
SSATPSGPWTTCLETGDLAP 

66781 CGGAATGGCGCTCGGTCCCCGTCGGCGCGCCCCTGCCCAACCAGCGGGCGCACATCCTGT 6684 0 
EWRSVPVGAPLPNQRAHILS 

6 6841 CCGAGACCCTGCGGCCCTGCCCGGTCTGGGTCACCGGCCGCCTCCACTACGGCGGCGTCG 6690 0 
ETLRPCPVWVTGRLHYGGVA 

66901 CCGCCGAGCCCCCCACCGGAGAGGAGCACGCACCCGCGACCGTCCCGCACCCGGAGACCG 66960 
AEPPTGEEHAPATVPHPETG 

66961 GCGAACCGCTGCTGCGCACCGGGCTGTTCGCCCGCCTGCTGCCCGAGGGCCTGATCGACG 6702 0 
EPLLRTGLFARLLPEGLIDV 

67021 TCGTCGGCGACGAGACCGCCCGGATCAGCGTCCGCGACCGGCCCCTGAACCTCCAGGACA 67 08 0 
VGDETARISVRDRPLNLQDT 

67 0 81 CCGAGACCGCCCTCGCCGCCCACGAGGACGTGCACTCCGCCGTGGTCGTCCCCGTCGGGC 6714 0 

ETALAAHEDVHSAVVVPVGR 

67141 GGGGAGACGAGTCGCTCGCGCGGGTACGGCTCCACCCCGGCGCCACGGCCGGCCCCGACG 6720 0 
GDESLARVRLHPGATAGPDE 

67201 AACTCCTCGCCCATCTGCGCCGCAAGGTCTCCCCTTACCTGCTGCCCGGCCACATCGAGG 67260 
LLAHLRRKVSPYLLPGHIEV 

672 61 TGGGCGGTCCGCTGCCGCTCACCCGGGACGGGCGCGTGGACCGCGCGCGCGTCACCGCCG 6732 0 

GGPLPLTRDGRVDRARVTAE 

673 21 AGGCCCCCGCCCCCGCTGCCGTGCCCGCCGCCGCGCCGGCGGCGTCGGCACCCGCGCGGG 673 8 0 

APAPAAVPAAAPAASAPARD 

673 81 ACGAGGCCGAACTCCTCGCCCAAGTGGCCCGGGTGACCTGCCGGGTGCTGGGAATCGGCG 6744 0 
EAELLAQVARVTCRVLGIGA 
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67441 CCGTCGAACCCGATATGAACCTGCTCGACGCCGGTGCCACCTCCGTCGAACTCGTCCGCC 67500 
VEPDMNLLDAGATSVELVRL 

67501 TGGCGACCGCTCTGGAGGAGGAACTCGGCCTCGACACCGACATCGAGGAACTGCTGGCCT 67560 
ATALEEELGLDTDIEELLAF 

67561 TCCCGTCGGTCGCCGTGATCGTCGGCCGCCACCTCGGCCGCCGGACGGCACCACCGGCCC 67620 
PSVAVIVGRHLGRRTAPPAR 

67621 GGGACCCCCTGCCGCCCGCGTCCGTAGCGTTCGCACCCGGGTCCGTACTGCCCGCGCCGC 67680 
DPLPPASVAFAPGSVLPAPP 

67681 CCGCGCCCGGACCCGTGCCGCCCGCGTCCGTGCCGCCCGCACCCGCGTCCGTACCGCCCG 67740 
APGPVPPASVPPAPASVPPA 

6 7741 CGTCCGAGTCCTCACCGCTCGCGCCGCCCGCACCCGGGCCCGTGCCACCCACGCCCGTCC 67800 
SESSPLAPPAPGPVPPTPVP 

678 01 CGCCCGCCTCCGTCCCGCCCGCGTCCGGGGCCGCGCCGCACGTACCGCCCGCGCCGCCCG 67860 
PASVPPASGAAPHVPPAPPA 

67861 CACCCATCCCCGCGCCCTCCGTGCCccccgcgccccgcccccaaccgcccctgctcaccg 67920 
PIPAPSVPPAPRPQPPLLTG 

67 921 gcatcggcgcccgccaggcgTTCAAGGACGCCCACCACGGCATCCGGCACGAGTTCGACG 6 79 80 
IGARQAFKDAHHGIRHEFDA 

67 981 CCACCGACGGCGTCGCCCTCAGCGGCCCGGACGACCACCACCTCACCGCCCGTCGCAGCC 68040 

TDGVALSGPDDHHLTARRSH 

68 041 ACCACCGCTTCGACCCCGGCCCCGTGACGCTGCCGGACCTGGCCGCCCTCCTCGGGGCCC 6 8100 

HRFDPGPVTLPDLAALLGAL 

68101 TCCGCCGGGTCCGCGGCCCGGGAGGCGAACCCAAATACGCCTATCCGTCGGCCGGTTCCT 6816 0 
RRVRGPGGEPKYAYPSAGSS 

68161 CCTACCCCGTCCAGACCTACCTGCTCGTCCACCCGGGGAAGGTGACCGGACTGCCCGGCG 6 822 0 
YPVQTYLLVHPGKVTGLPGG 

68221 GCAGCCACTACGTCCACCCCGCGCGCAACCGCCTGGTGAGCATCGACCCCACCGCGACCC 682 80 
SHYVHPARNRLVSIDPTATL 

68281 TGCCCGCCGACGCGCACGCCGAGATCAACCGCGCCGCCTACGGGGAGGCGGCCTTCTCCC 68340 
PADAHAEINRAAYGEAAFSL 

683 41 TCTACCTCATCGCCGCGATCGACGCGATCACACCGCTCTACGGCGATCTCTCCTGGGACT 6840 0 

YLIAAIDAITPLYGDLSWDF 

684 01 TCACCGTCTTCGAGGCCGGTGCCATGACCCAGTTGCTGATGCGGACCGCCGTCGGCACCG 68460 

TVFEAGAMTQLLMRTAVGTG 

684 61 GCATCGGCCTGTGCCCCGTCGGCACGATGGACCCCGCGCCGCTGCGCCGCGCGTTCGCCC 68520 
IGLCPVGTMDPAPLRRAFAL 

68521 TCACCGACCGGCACCGCTTCGTCCACGCCCTCCTCGGCGGGCGGCCCCGCACGGAGGCCC 68580 
TDRHRFVHALLGGRPRTEAP 

68581 CGTGAACCGGCACGGCCCCCTGGCGGGCCGGCGGCAGAGCGTCGACACCCGCAGCGCCGC 68 64 0 

MNRHGPLAGRRQSVDTRSAA (orfl5) 

68 641 GTGGGTGGCGCCGACGGGCACCCCGGGGCTGCCGCTGGAGGTGGCCGCCACCCGGGACGG 68700 
WVAPTGTPGLPLEVAATRDG 

687 01 CGTCGACCCGGCCGAATGGGCCCGCACCCACCTCGACACCGTCACCGGCTGGCTGCACCG 68 760 
VDPAEWARTHLDTVTGWLHR 

6 87 61 TCACGGAGCCGTCCTGTTCCGCGGCTTCGGCGTCGGCCTCGACGGCTTCGGCGACGTCGT 68 820 
HGAVLFRGFGVGLDGFGDVV 

6 8821 CCACGCCCTGGCCGGATCCCCCGAGGCGTACGTCGAACGGTCGTCGCCGCGCACCGCCCT 6888 0 
HALAGSPEAYVERSSPRTAL 
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688 81 CGGGCATCACCTCTACACCGCCACCGACCACCCCGCCGACCAGCCCATCCCCCCGCACAA 6 8940 
GHHLYTATDHPADQPI PPHN 

6 8941 CGAGAACTCCTACCAACTCCGCTTCCCCGGACGGCTGGTCTTCGGCTGCCTCACCCCGGC 69000 
ENSYQLRFPGRLVFGCLTPA 

6 9001 CCGGACCGGCGGCGCGACCCCGCTCGCCGACACCCGGCGCGTCCTGGGCCGCCTCGACCC 69060 
RTGGATPLADTRRVLGRLDP 

69 061 CGCCCTCGTCGCCGCCTTCGCCCGCCGCGGGGTGCTCTACCAGCGCAACTACGGCGACGG 6912 0 
ALVAAFARRGVLYQRNYGDG 

69121 GATCGGCATGTCCTGGCAGGACGCCTTCCAGACCCGCGACAAGGCGGCCGTCACCGCCTA 6 9180 
IGMSWQDAFQTRDKAAVTAY 

69181 CTGCGCCGCCCGCCGCGTCGACGTCGAATGGAAACCCGACGGCGGGCTGCGGACCACCCA 6924 0 
CAARRVDVEWKPDGGLRTTQ 

69241 GGTCCGCCCCGCCCTCGCCGTCCACCCGGCGACGGGGGAGCGGGTGTGGTTCAACCACGC 693 00 
VRPALAVHPATGERVWFNHA 

693 01 CGCGTTCTTCCACGTCTCCGCCCGGCCGCCCGCGCTGCGGGACGCCCTGCTGGCCCAGTT 693 60 
AFFHVSARPPALRDALLAQF 

693 61 CGACGAACGCGACCTGCCGAGCCACTCCTGCTACGGCGACGGCCGGCCCATCGAACCCGC 6 942 0 
DERDLPSHSCYGDGRPIEPA 

69421 CGTCATGGAGGAACTGCACCACGCCTACGCCGCCGAACTGGTGGCGCCCGCCTGGCGGGC 69480 
VMEELHHAYAAELVAPAWRA 

69481 CGGCGACGTCCTCCTCGTCGACAACCTCCTCACCGCGCACGGCAGGGAACCCTTCACCGG 69540 
GDVLLVDNLLTAHGREPFTG 

69541 CGAACGCCGCGTCGTCGTCGGCATGGCACAGCCGCTGGACTGGGACGAGGTGAGCGCGTG 69600 
ERRVVVGMAQPLDWDEVSA* 

M (orfl4) 

696 01 ACCGCCCCCGGCACACCGCTGCCCGCGACCTTCGTCCAGCGCGGCCTGTGGCCGTCCACT 6966 0 
TAPGTPLPATFVQRGLWPST 

696 61 CGCCACGCCCGCCCGGCGGAGGTCACCCACGTCCGCGCCCTGCGCCTGACCGGGGACACC 69720 
RHARPAEVTHVRALRLTGDT 

69721 GACACGGCGCGGCTCACCGAGGCCGTCCGGCGGGTCACCGCCGCCCTCCCCGCCCTCACC 6978 0 
DTARLTEAVRRVTAALPALT 

6 9781 GCCGAACTCTCCGGCGACGAGGAACCCCGCCTGACCCTCCGGCCGGACGCCCCCGAGGTC 6984 0 
AELSGDEEPRLTLRPDAPEV 

69841 ACCCCGGTCGACCTGCGCGGAGCCCCGTCCGCCGGACGCGACGCCGTCTGCGTGGCGCTG 6990 0 
TPVDLRGAPSAGRDAVCVAL 

6 99 01 CTGCGCGCCGACCGGGACCACCCTCGCGCCGGACGCCACCGGGCCCGCTTCCACCTGGTG 6996 0 
LRADRDHPRAGRHRARFHLV 

6 9961 CGGCTCCACGACGACGAGACGGTGCTCGCGCTCACGGCCCACACCCTCCTCCTCGACACA 7 0 02 0 

RLHDDETVLALTAHTLLLDT 

7 0021 CCGTCTCTCTACGCCGTGCTCGGCGCGGTCTGCCAGGCGTACGCCGGCCGCTTCCGCCCC 70080 

PSLYAVLGAVCQAYAGRFRP 

7 0 081 GAGCACTACCGCGACGCCACCACCCTGCCCGACGCGCCCCACGCCCCCCTCTCCGGTCGG 7014 0 
EHYRDATTLPDAPHAPLSGR 

70141 GCCCGGGCCTCCCGCCGGCGCTGGTGGCACCGGCGCCTGGCCGCCCTGCCCGGCCCGGCC 70200 
ARASRRRWWHRRLAALPGPA 

7 02 01 CCGGCCCCCGCCGGCCCGCCCCGCGACCGGGTGACCGAAACCCACCGGCTGCGCATCCCC 7 0260 
PAPAGPPRDRVTETHRLRIP 

7 0261 GCAGCGCGCTGGAAAGCCCTGACCGCCCTGACCGCCCTGGGCGGCCCCCTCGGCGGCAAC 70320 
AARWKALTALTALGGPLGGN 

70321 GGCTCGCTCGCCGTCATGGCCCTGGCCGCCTGGTGCCTGCGCGCCCCGGACCACCGGGGA 70380 
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GSLAVMAIjAAWCLRAPDHRG 

7 0381 CCGGCCCGCTTCACCACCGTCGTCGACCTGCGCGACCACCTCGGACTCGGGCCCGCCGTC 7 0440 
PARFTTVVDLRDHLGLGPAV 

7 0441 GGCCCGTTCACCGACCGCCTCGTCTTCGGCGCCGACCTCGGCGAAGCGCCGCGCCCCTCC 70500 
GPFTDRLVFGADLGEAPRPS 

70 501 TTCCGGGACGTCACGCTGCGCGCCCAGTCCGGGTTCCTGGACGCCGTCGTGCACTACCTC 70560 
FRDVTLRAQSGFLDAVVHYL 

70561 CCCTACGGCGACGTCGTGGAACTCgGCAGGGAACTGGGCCGCGTCACCGCGCCCCGCACC 70620 
PYGDVVELGRELGRVTAPRT 

70621 GCCGCGCACTGGGACGTGGCGCTGAACTTCTGCCGCAACCCGCCCACCAGCGCCGCCACC 7 0680 
AAHWDVALNFCRNPPTSAAT 

70 681 CGCGGCGAACGCACCCTCGCCGAACGCGGCCTGTCCATCGAGCTGTTCCGCGAGGCCGAC 7 0 74 0 
RGERTLAERGLSIELFREAD 

70741 CTGCTCGGCGCGGCCGGCACCGGTCCCGCGCACCGGTGGGACGGCACGGTGCTCGCCCTC 7 08 0 0 
LLGAAGTGPAHRWDGTVLAL 

70801 TCCCTAGGCGAACTCGGCGACGACACCGTGCTGGTCCTCGACGCCGACCGCGACCACCCG 70860 
SLGELGDDTVLVLDADRDHP 

70 861 CACCACGGAACCGCCGACCGGCTGCTCCACCGGATGGACGAAGCGCTCCTGGCGGCCGTC 7 0920 
HHGTADRLLHRMDEALLAAV 

70 921 GCCGACCCGGACGCCCCCCTGCCCCCCTTGCCCGCCCCCGCGCACACCACGAGGAGCCAC 7 0980 
ADPDAPLPPLPAPAHTTRSH 

70 981 CGATGACCACGACCCCGCGGACCGCCGCCGAGCCCACCTACCACGTGGTGGTCAACGACG 7104 0 

MTTTPRTAAEPTYHVVVNDE (orfl3) 
R * 

71041 AGGAGCAGTACTCGATCTGGCTCGCCGAACAGGAGATCCCGGCCGGCTGGCGGGCCACCG 71100 
EQYS IWLAEQEI PAGWRATG 

71101 GAACCTCCGGCACCCAGGAGGAGTGCCTGCGCCACATCGACGAGGTGTGGACCGACATGC 71160 
TSGTQEECLRHIDEVWTDMR 

71161 GCCCCCGCAGCCTGCGCGAGGCCATGGCCGCGGCGGAGCACGCGGAGCCCGCTCCCGCCC 7122 0 
PRSLREAMAAAEHAEPAPAP 

71221 CGGCCCCGGCCGAGGAGGAGCCGAGCCTCGTCGACCGGCTCTGCGCGGGCGACCAGCCGG 712 8 0 
APAEEEPSLVDRLCAGDQPV 

712 81 TGGAGTCGGTCCTCCGCCCGGAGCGCACGGCCGCCGCCCTGCGGGAGGCCGTCGACCGCG 7134 0 
ESVLRPERTAAALREAVDRG 

71341 GCTACGTCTTCGTCCGCTTCGCCGCCACCCGCGGCGGCACCGAACTCGGCGTCGCCGTCG 7140 0 
YVFVRFAATRGGTELGVAVD 

714 01 ACCCCGCGGCGACCACCATGGACGGCACCGAGCTGCGCCTGACCGGCACCCTCACCCTCG 7146 0 
PAATTMDGTELRLTGTLTLD 

714 61 ACTTCGAACCGGTCCGCTGCCACGCCCGCGTCGACGTGACCACCTTCACGGGCGAGGGCC 7152 0 
FEPVRCHARVDVTTFTGEGR 

71521 GCCTGGAGCGCGTGTCCGGCACCTGACCCCCGCCGGCCACCCGGCCGTGAGGCGCGGCTC 7158 0 
LERVSGT* 

71581 GGGACCGGGCCGCCGACCCACCGAAGGGAGGGACCCCATGACCACCCCCATGACCACCCC 71640 

MTTPMTTP (orfl2) 

71641 CACGACCACCCGCACCACCACCCGCACCGCCGTCTTCGCCCACCTCCGCGCCCCCGGCCT 71700 
TTTRTTTRTAVFAHLRAPGL 

71701 CGGCGACCTCCTCCAGCGCAACATCGGCCTCGCCCTCGTCCGCCGCGCCCGCCCGGCGAC 7176 0 
GDLLQRNIGLALVRRARPAT 

717 61 GGCGGTCACCCTGGTCGTCGGCGAGGACCTGGCGGCCCGCTTCGGTCCGGCACTCACCCG 71820 
AVTLVVGEDLAARFGPALTR 
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71821 CCACACGTACGCCACCGACGTGCTGCCCTGCCCCCAGCGGGGCGAGGCCGACCCCCGGTG 71880 
HTYATDVLPCPQRGEADPRW 

7 1 8 S 1 GCCCGCCTTCCTGCGCACCCTGGCCGACCGCCGCTTCGCCCTCGCCGTCGTCGACCCGGA 71940 
PAFLRTLADRRFALAVVDPD 

71941 CAGCCAGGGCCTGCACGCCGGCCACGCCCGGGCCGCCGGCGTGCCCGAGCGGATCGGCCT 72000 
SQGLHAGHARAAGVPERIGL 

72001 GCCGCAGGACCGGCCCGGAGACGAACACATCACCCATCCCATCCGCCTCCCACGTCCCCT 72060 
PQDRPGDEHITHPIRLPRPL 

72061 GTGGGGGACCCCGGACCTGTACGAGTACGCCACTGCCCTCGCCGCCGCGCTGGGCCTGCC 72120 
WGTPDLYEYATALAAALGLP 

72121 CGCACCGCCGCGCCCCGGCGACGTCCTGCCGGAGCTGCCCCGCACCCGCGGCGTCCGCCC 72180 
APPRPGDVLPELPRTRGVRP 

72181 GCCGACGGCCGGTCTGCCCCGTCCGCTCGTCGCCGTCCACCCCGGCGGGGCACCGCACTG 72240 
PTAGLPRPLVAVHPGGAPHW 

72 241 GAACAGGAGATGGCCGCTCGAGCACTACGCCCGGCTCTGCGCCCGCCTCGCGGCCGAACT 723 00 
NRRWPLEHYARLCARLAAEL 

72301 CTCGGCCTCCCTCTGCCTGCTGGGCGACGAAGCCGAACGCCCCGAGCTGGAACTGCTCCG 72360 
SASLCLLGDEAERPELELLR 

723 61 GCACGCCGTCCTGACGCGGTCCCCGCGAGCCGTCGTCCACCTCGAGGCGGGCGCGGACCT 72420 
HAVLTRSPRAVVHLEAGADL 

72421 CGACCGGACCGCGAACGTCCTCGCCGACGCCGACCTGCTCGTCGGCAACGACTCCTCGCT 72480 
DRTANVLADADLLVGNDSSL 

72481 CGCCCACGTCGCCGCCGCCGTCCGCACCCCGTCCGTCGTCCTCTACGGCCCGACCGGCAC 72540 
AHVAAAVRTPSVVLYGPTGT 

72 541 CGAGTACCTGTGGACCAGGATCTACCCGTAC CACCGCGGGGTCTCCCTGCGGTGGCCGTG 72 6 00 
EYLWTRIYPYHRGVSLRWPC 

72601 CCAGCGGCTGCGGCACGCCGCAGGCGAACTCGCCGGCCGGCGGTGCGCGCACGGCTGCGT 72660 
QRLRHAAGELrAGRRCAHGCV 

72 661 CCTGCCCTACCAGGGCCCGGCCGGCCCGTATCCGCGCTGTCTGGCCGACCTGCCGGTGGA 72 72 0 
LPYQGPAGPYPRCLADLPVD 

72 721 CAGGGTCTGGCCGGCGGTGACCGCCCGATGGGCGAGCCCCCACCCCGTGACGATCAGGAG 72 78 0 
RVWPAVTARWASPHPVTIRS 

72 781 TACCCCATGAGCGCCGACCCGTCCCGGGTGCGGACGATCCTCTCCGTCAACTTCAACCAC 72 84 0 
TP* 

MSADPSRVRTILSVNFNH (orfll) 

72 841 GACGGCTCCGGCGTGCTGTTGCGGGAGGGCAGGATCGCCGGCTACGTCACCACCGAGCGC 72900 
DGSGVLLREGRIAGYVTTER 

72 901 CGCTCCCGCCTCAAGAAGCACCCGGGCCTGCGCGAGGAGGACCTCGACGAACTGCTGGAC 72960 
RSRLKKHPGLREEDLDELLD 

72 961 CAGGCCGGGGCCGACCTCTCCGACATCGACCACGTCATGCTCTGCAACCTGCACACCATG 73 02 0 
QAGADLSDIDHVMIiCNLHTM 

73021 GACACACCCGACATACCCCGGCTGCACGGCTCCGACCTCAAGGAGACCTGGCTCGCGTTC 73080 
DTPDIPRLHGSDLKETWLAF 

73081 TGGGTCAACCAGCGCAACGACGAGGTGAGCCTGCGCGGCCGCCGCATCCCCTGCACCGTC 7314 0 
WVNQRNDEVSLRGRRIPCTV 

73141 AACCCGGACCACCACCTCATCCACGCCGCCACCGCCTACTACACCTCCGGCTACGACTCG 7320 0 
NPDHHLIHAATAYYTSGYDS 

732 01 GCGATGGCCGTGGCCATCGACCCCACCGGCTGCCGCGCCTTCGCCGGCAAGGGCAGCCGC 73260 
AMAVAIDPTGCRAFAGKGSR 
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73261 CTCTACCCCCTGCGCCGCGACCTCGACGCCTGGTTCAACGCCAACATCGGCTACTGCTAC 73320 
LYPLRRDLDAWFNANIGYCY 

73321 GTCGCCGACCTGATGTTCGGCTCCAGCATCGTCGGCGCCGGCAAGGTCATGGGCCTCGCC 733 80 
VADLMFGS S IVGAGKVMGLA 

73381 CCCTACGGCAGACCCGCCGACGGCGCCGGCCCCGACGAGGAACCGCCCGAGACCGTGCGC 73440 
PYGRPADGAGPDEEPPETVR 

73441 GACTTCGCCGCCCTGGTGGCCCTGGCCGACCGGCACCCGCGCCTCGTCGACGTCGACGGC 735 00 
DFAALVALADRHPRLVDVDG 

73 501 AGGAAGCTCAACGCCACCCTCGCCCACTACATCCAGCTGGGCCTGGAACGCCAGCTGACC 73560 
RKLNATLAHYIQLGLERQLT 

73 561 GCCGTCTTCGCCGAGCTCGCCCCGCTGTGCGCCCGCAACGGCATCGCACCGGACATCTGC 73 620 
AVFAELAPLCARNGIAPDIC 

73 621 CTCTCCGGCGGTACCGCCCTCAACGCCATCGCCACCCAACTCGCCTTCGAGTCGACCGGC 73680 
LSGGTALNAIATQLAFESTG 

73 681 TTCGAGCGCATGCACCTCCACCCCGCCTGCGGCGACGACGGCACCGCGATCGGCGCGGCG 73740 
FERMHLHPACGDDGTAIGAA 

73 741 CTCTGGCACTGGCACCACGTCCTGGGCCACCCCCGGCTCCACCACACCAACGCCGACCTC 73800 
LWHWHHVLGHPRLHHTNADL 

73 8 01 ATGTACTCCGTCCGTGAGTACCCCGAGCACACCGTCCGGCGGGCCGTGCGGGACCACGCG 73860 
MYSVREYPEHTVRRAVRDHA 

73 861 GCCGACCTCGTCGTCGAGGAGACCGGCGACTACGTCGCCAGGGCCGCCGAACTGGTCGCC 73 92 0 
ADLVVEETGDYVARAAELVA 

73 921 GGCGGCGCCGTCATCGGCTGGTACGACGGCGCCGGCGAGGTCGGGCCGCGGGCCCTGGGC 73 9 80 
GGAVIGWYDGAGEVGPRALG 

73 981 CACCGCAGCATCGTCGCCGACCCGCGCGACCCCGCCATGCGGGACCGGCTCAACTCCCAG 74040 

HRS IVADPRDPAMRDRLNSQ 

74041 GTCAAGTTCCGCGAACACTTCCGGCCcTTCGCGCCGTCCGTGCTCAAGGAGCACGCCGCG 74100 
VKFREHFRPFAPSVLKEHAA 

74101 GAGTGGTTCGGCCTCTCCGACAGCCCCTTCATGCTGCGGGCCACCCCCGTCCTCAAGCCC 74160 
EWFGLSDS PFMLRATPVLKP 

74161 GGCGTGCCCGCCATCACCCACGTCGACGGGACGTCGAGGATCCAGTCGGTCACCCGCCAG 74220 
GVPAITHVDGTSRIQSVTRQ 

74221 GACACCCCCGCCTTCCACGACCTCATCCACGCCTTCAAGGACCGTACGGGGATCCCCATG 742 80 
DTPAFHDLIHAFKDRTGI PM 

74281 GTGCTCAACACCAGCCTCAACACCAAGGGCGAGCCGATCGCGGAGACACCCGAGGACGCC 74340 
VLNTSLNTKGEPIAETPEDA 

74341 CTGCGCACCCTGCTCGGCTCCCGGCTCGACCACCTGGTGCTCCCGGGCCTCATCGTCAGC 744 00 
LRTLLGSRLDHliVLPGLIVS 

74401 GGCCGGACGGCGGCCCGCTCATGAGCGCCCCGCGGGGCGAGCGGACCCGGCGCCGCGCGC 74460 

MSAPRGERTRRRAL (orflO) 
GRTAARS * 

74461 TCGAACGCGACATCGCCGCGATCTGGGCCGAGACCCTCGGCAGGGACAGCGTCGGCCCGC 74520 
ERDIAAIWAETLGRDSVGPH 

74521 ACGAGGACTTCGCCGCGCTGGGCGGCAACTCCATCCACGCCATCAAGATCACCAACCGGG 74580 
EDFAALGGNS IHAI KI TNRV 

74581 TGGAGGAACTCGTCGACGCCGAGCTGTCCATCCGCGTCCTGCTCGAGACGCGCACCGTGG 74640 
EELVDAELSIRVLLETRTVA 

74 641 CCGGCATGACGGACCACGTCCACGCCACGCTCACGGGGGAGCGGGACCGGTGAACACCGA 74700 

GMTDHVHATLTGERDR* 

M N T D (orf9) 
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74701 CCTGCCCCGGCTGCTCGACCGGATCGCCGGCCTGCGCGTCCTCGTCATCGGCGACGTCAT 74760 
LPRLLDRIAGLRVLVIGDVI 

74 761 CCTCGACACCTACGTCTGGGGAGCCACCTCGGGCCTGTGCCGCGAATCCCCCGTCCCTGC 74820 

LDTYVWGATSGLCRESPVPA 

74821 CGTCACCCTGACCTCCGTCGCCCACCAGTGCGGCGGCGCCGCCAACGTCGCCGTGAACCT 74880 
VTLTSVAHQCGGAANVAVNL 

74881 CCGGGCGCTCGGCGCCGAACCGGTGCTGCTCTCCGCGACGGGTGACGACCGCGCCGGCCG 74940 
RALGAE PVLLSATGDDRAGR 

74941 CCGGCTGCGCGAAGCCCTCCGTGCGCGGGACGTCGACACCGGCGGACTCTTCGTACAGCC 75000 
RLREALRARDVDTGGLFVQP 

75 0 01 CGGCCGGACCACGGTCACCAAACGCCGCGTCATGGCCGACGGACAGATGCTGCTCCGCCT 750 60 

GRTTVTKRRVMADGQMLLRL 

750 61 CGACGAGGGCGGCGAACACCCGTTGCCCGTGGCGACGGACACCGGAAGCCGCCTGCTCGA 7 5120 
DEGGEHPLPVATDTGSRLLE 

75121 ACGGGCCGCCGGCCTGCTGCCCGCCGTCGACGCCGTGATCGTCTCCGACTACGGGTACGG 75180 
RAAGLLPAVDAVIVSDYGYG 

75181 CGTGTGGGAGCCCGACACCGTCGCCCGGCTCGCCGCACACCGCGAACTCGGCCCGTCCAC 7524 0 
VWEPDTVARLAAHRELGPST 

75241 CCTGGTCGTCGACTCCCGCCGGCCCGCGCGCTTCACCGCGCTGCGGGCCAGCGCCGTCAA 753 0 0 
LVVDSRRPARFTALRASAVK 

753 01 ACCCAACCACGCGGAGGCGCTGCGCCTGCTCGACGCCGGCGAACCCCCGCCCGGCCCGGC 7 53 60 
PNHAEALRL LDAGE P P PGPA 

753 61 CAGGGCGGACTGGGCGGCCGCCCTCGGCGACCGGCTCCTGCGCCTGACGGGAGCCGAACG 75420 
RADWAAALGDRLLRLTGAER 

7 5421 GGTCGCCCTCACCCTGGACGCCGACGGATCACTGCTCTTCGAACGCGACCGGCCCCCGGT 75480 
VALTLDADGSLLFERDRPPV 

75481 CCGCACGTTCGCCCGGGGCAGCCGGGCACCGGTCACGGCCGCCGTCGGCGCCGGCGACGC 75540 
RTFARGSRAPVTAAVGAGDA 

75541 CTTCACCGCGGCCCTCACCCTCGCCCTCGCCGCCGGCGCCGACTCCGCGGTCGCCGCCGA 75600 
FTAALTLALAAGADSAVAAE 

7 56 01 ACTGGCCTCCGCCGCCGCCGGCACGGCCGTCGCCACCCCCGGCACCAGCACCTGGCACGC 75660 
LASAAAGTAVATPGTSTWHA 

756 61 CGACGAACTGCGCCGACTGCTCGGCGGCACCGGCAAGGTCTGCCGGACCGGCACCCTGCC 7572 0 
DELRRLLGGTGKVCRTGTLP 

75721 CGCCCGGCTGCTCGACCCGGCCGCCCGCGACCGCCGGGTCGTCTTCACCAACGGCTGCTT 75 78 0 
ARLLDPAARDRRVVFTNGCF 

75781 CGACCTCCTGCACGGCGGCCACGTCTCCTGCCTGAGCCGGGCCAAGGAACTGGGCGACCT 75840 
DLLHGGHVSCLSRAKELGDL 

7 5841 GCTCGTCGTCGGCGTCAACTCCGACGCGAGCGTCCGACGCCTCAAGGGCCCCCGTCGCCC 7590 0 
LVVGVNSDASVRRLKGPRRP 

7 5901 GGTGATCCCCCTCGCCGAACGCATGCGCGTCCTCGCCGCCCTGAGCTGCGTGGACCTCGT 75960 
VI PLAERMRVLAALSCVDLV 

75961 CGTGCCCTTCGACGACGACAGCCCCGCCGCCCTCATCGAGGCCCTCCGCCCCGAGGTCTA 76 02 0 
VPFDDDSPAALIEALRPEVY 

76021 CGCCAAGGGCGGGGACTACACCCTCGCGACCCTGCCCGAAGCACCCCTCGTCCAACGGCT 76080 
AKGGDYTLATLPEAPLVQRL 

760 81 CGGCGGCGTCGTCCACCTGCTCCCCAGCGTCGCCGACACCTCCACCACCGACATCATCCG 7614 0 
GGVVHLLPSVADTSTTDI IR 

76141 GCGCATCCACGCCCTGTCCAGGACCGGCGAGGGAGACACCCCATGAGCCACGCCATCGGA 76200 

M S H A I G (orf8) 
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RIHALSRTGEGDTP* 

762 01 CCGAGCCGGCTGATCCCCGCCATCCGCGAAGCGCTCGGGGACGAGAAGGACCCCCGGCTC 762 6 0 

PSRLI PAIREALGDEKDPRL 

76261 GCCCTCTACGTCCACGTCCCCTTCTGCTCCTCCAAGTGCCACTTCTGCGACTGGGTCACC 7632 0 
ALYVHVPFCSSKCHFCDWVT 

76321 GACATCCCCGTCGCACGCCTGCGCGGCGACAGCCGGGAACGCTCGCCCTACGTCACCGCC 76380 
DIPVARLRGDSRERSPYVTA 

763 81 CTCTGCGACCAGATCCGCTTCTACGGCCCCCAGCTCACCCGGCTCGGCTACCGCCCCGAG 7644 0 

LCDQI RFYGPQLTRLGYRPE 

76441 GTCATGTACTGGGGCGGCGGCACCCCCACCCGGCTCACCGGCGACGAGATGACGGCCGTC 76500 
VMYWGGGTPTRLTGDEMTAV 

76501 CACCAGGCCCTCGACGACGCCTTCGACCTGACGGGACTCCGCCAGTGGTCGGTGGAGAGC 76560 
HQALDDAFDLTGLRQWSVES 

7 6561 ACCCCGAACGACCTCGACCCCGCCACCCTCGACACCCTGCGCGGCCTCGGCGTCACCCGC 7662 0 
TPNDLDPATLDTLRGLGVTR 

7 6621 GTCAGCGTCGGCGTCCAGTCGCTCAACCCGTACCAGCTGCGCAAGGCAGGCCGGGCCCAC 76 68 0 
VSVGVQSLNPYQLRKAGRAH 

7 6681 TCGCGCGAACAGGCCCTGGCCGCCGTCCCCCTGTTGCGCCGCGCCGGCATCGACGAGTTC 7674 0 
SREQALAAVPLLRRAGIDEF 

7 6741 AACGTCGACCTGATCGCCGGCTTCCCCGGCGAAGCCGTCGAGTCCTTCGAGGAGACCCTG 76 80 0 
NVDLIAGFPGEAVESFEETL 

7 68 01 CGCACCGTCCTCGCGCTCGACCCGCCGCACGTCTCCGTCTACCCCTACCGCGCCACCCCC 76860 
RTVLALDPPHVSVYPYRATP 

768 61 AAGACGGTCATGGCCATGCAGCTCGACCGCGAGTTCGTCGAGGCCCGGAACCGGGACGGC 76 920 

KTVMAMQLDREFVEARNRDG 

76921 ATGATCGACGCCTATGAACGGGCCATGGCCGCGCTCGGCGCCGCCGGCTATCACGAGTAC 76980 
MIDAYERAMAALGAAGYHEY 

769 81 TGCCACGGCTACTGGGTGCGCGACGCGCGCCACGAGGACCAGGACGGCAACTACAAGTAC 77 04 0 

CHGYWVRDARHEDQDGNYKY 

77041 GACCTGGCCGGCGACAAGATCGGCTTTGGCAGCGGCGCCGAATCGATCATCGGTCACCAC 7710 0 
DLAGDKIGFGSGAESI IGHH 

77101 CTGCTCTGGAACGAGAACAGCGCCTACGCCCGCTACCTGCTCGCCCCCCGCGAGTTCTCC 7716 0 
LLWNENSAYARYLLAPREFS 

77161 GCCGCCCACCGGTTCACCACCGCCGAACCCGACCGCCTGACCGCCCCCGTCGGCGGCGCG 7722 0 
AAHRFTTAE PDRLTAPVGGA 

772 21 CTGATGACCCGTGAAGGCGTGGTCTTCGCCCGCTTCCGCAGACTGACCGGCCTGGACTTC 7728 0 
LMTREGVVFARFRRLTGLDF 

772 81 GCGGACGTCCGCGCCACACCGTACTTCCGCCAGTGGTTCGAGCTCCTGGAGCGCTGCGGC 7734 0 
ADVRATPYFRQWFELLERCG 

77341 GGCCGCTTCGTCGAGACGCCGTACAGCCTCCGCCTGGAGCCGTCCACCATCCACCGCGCC 77400 
GRFVETPYSLRLEPSTIHRA 

77401 TACATCACCCACCTCGCCTACACCATGGCCCATGGCCTGGCCCCCGAACGCGCCTGA 77457 
YITHLAYTMAHGLAPERA* 



SEQ ID NO: 2 ORFS BLM gene cluster ORFs 31-40 

(notice this part is on the reverse strand and the last nucleotide (18660) 
the first (1) on the whole cluster of 77457 bp. Also the last orf (40) is 
incomplete and contains frame shifts) 
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1 GTGACCGAGAACCTTCCGTCGTGCCCCGAATGCTCCAGCGCGTACACCTATGAGATGGGT 60 
MTENLPSCPECSSAYTYEMG 
(orf31) 



3 61 ACGCCGGCCCAGGCCCTGCCCAGGCTCCACTACGCCGCGGCGCAACCGAGCCGGAACGGG 420 
421 GCCCGGGCCCGCTCCAAGTCCCGTTCCGTGCGCGGCCGCGGCAGCCAGGCCGTGTTCACC 4 80 

4 81 CTGGGGTCGCCGTCCCCGTTCGCACGCGTCGTACACGCCACCACGCACGGCACGGAAGTC 54 0 
541 CCCGAACTCGCCACGTTCCCCAAGTCCCCGCGTGCCCGGATCCGCCCGGACCGGCGTCGG 6 0 0 



661 GAACGCCGGCCGGAAATTTACGTATAGGTAGAGATCCCGGCGAAGCGATCGGCGCGTTAT 720 

721 GGCAGCATCCGCGCCGGCCCGCCGCGCAGTTCCTCGGTCCCGGACCGATGGCGTCAAAAG 7 80 

7 81 TGAGCGACGAAATCGCCGGATCGCGCGAGGACCGTCGCGGGCCGCACGAGGACAACCGGG 84 0 

841 GGATATATCAGCGCATTCCCAGGTCACGCGTTGACTGGAAATCGCCTACTTATCGCGTCA 900 

901 CGCCTGTAGGGATCATGGCCGGGAATGGCCTCAGACGCTTTGAGTGCCCACCTTGAGGTT 960 
MASDALSAHLEV 
(orf32) 
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13 81 TTCCTGGACGACGCTCCTCCGACCTGCTCCGCCGTCCTTCCACGGCACTCAGCGGGGACC 1440 
FLDDAPPTCSAVLPRHSAGT 



1441 GCGTCGGAAATCGCCTATGTGCTGTACCCGACGACTCCTGACGAGAAGTCCGAAAATTCG 15 00 
ASEIAYVLYPTTPDEKSENS 



15 01 GTCGTCTCCTATCGTGATATGGCGCGCTACCTTGACGACCCCACTGCCGGGATTCCGGCG 1560 
VVSYRDMARYLDDPTAGIPA 



1561 AGGGCGGAGATTCTCCGGCTGGTCGCGCCGCTCCTGTCCGGCGGTCGTCTGGTGCTGGAC 162 0 
RAEILRLVAPLLSGGRLVLD 



1621 GCCGACGAGACCCGGCCCCGGCCGGTCACCCGTGAGGCGCCGCGCGACATGGTGGAGGAC 1680 
ADETRPRPVTREAPRDMVED 



1681 GTCGTGGCGCAGGTCTGGTGCGCCGTGCTCGGCGTGGACCGGGTGGGCGTGCGGGACCGC 174 0 
VVAQVWCAVLGVDRVGVRDR 

1741 TTCTTCGACCTGGGCGGCAAGTCGCTGGCGGCGGTCCAGGTGGTGGCGCGCCTGCGGAAG 1800 
FFDLGGKSLAAVQVVARLRK 



18 01 CTGCTCGGCGTCGAGCTGCCGCTGCGGGCCCTGTTCGACGCGCCGACGGTCGAGGAGCTG 1860 
LLGVELPLRALFDAPTVEEL 



1861 GCCGCCCGGGTGCGGGCCGAACAGGCCGGCGGCCAGGGCGTCCGGGAGGAGGCGGCGGTC 192 0 
AARVRAEQAGGQGVREEAAL 

1921 GAGCCGGTGGGCCGGAGCGAGCCGCTGCCGCTGTCGTTCGCACAGCAACGCCTGTGGTTC 1980 
EPVGRSEPLPLSFAQQRLWF 

19 81 CTGGACCGCTTGATGCCCGACCGCGCCTTCTACACGATGTGCGACGCGTTCCGCGTCCGG 2 04 0 
LDRLMPDRAFYTMCDAFRVR 



2 041 GGCGGGATCGACCTGGGTGCGCTGCGGCGGGCCCTGCGGATGCTGGTGGGACGGCACGAG 2100 
GGIDLGALRRALRMLVGRHE 



2101 ACGCTGCGGACGGCGTTCGTCGAGCGGGACGGTGTGCCGTACCAGCTCGTCGGTCCGGCC 216 0 
TLRTAFVERDGVPYQLVGPA 

2161 GACGGGCCCGGTGCGCGGCGCGTGGCCGCTCCCACGCGGGTCGACCTGTCGCTGCTGGAG 2 22 0 
DGPGARRVAAPTRVDLSLLE 



2221 CCCGCCGAGCGGGAGGAGGCGGTGCGGAACCTGGTGGCGGCGGAGGCGCGGACCCCGTTC 2280 
P A E R E^/E AVRNLVAAEARTPF 

22 81 CGGCCGGCGGACGGCGCGCTGCTGCGCGTGGTGGTGGCCCGGCTGGCGGACGATGATCAC 2340 
RPADGALLRVVVARLADDDH 



2341 GTGCTGGTGGTCAGCACGCACCACATCGTCTCCGACGCCTGGTCCGTGGGTGTGCTGGTG 240 0 
VLVVSTHHIVSDAWSVGVLV 



24 01 GACGAACTCGGACGGCTGTACCGCGAGTGCGTCACCGGAGATCCCGCCGCGCTGCCCCCG 24 6 0 
DELGRLYRECVTGDPAALPP 



24 61 CCGGCCGTCCAGTACGCCGACTTCGCGGTCTGGCAGCGGGCCTGGATGGCCGGTCCGGTG 252 0 
PAVQYADFAVWQRAWMAGPV 

2521 CAGGAGGAGCATCTCGCGTACTGGAAGCGGGCCTTGGACGGCGCTCCCTCGGTGCTGCGG 258 0 
QEEHLAYWKRALDGAPSVLR 

2581 CTGCCCATGGACCACCCGCGGCCCGCCGTGCAGTCCGAGCGGGGCGAGACGGTCGGGTTC 264 0 
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LPMDHPRPAVQSERGETVGF 

2641 GCGCTGCCCGACGCGCTGGTCGCCGCGCTGGAGAAGCTGGGCCGGGAGCAGGGCGCCACC 2 700 
ALPDALVAALEKLGREQGAT 

2701 CTGTTCATGACGCTGCTCGGCGCCTTCCAGGTCCTGCTGGCGCGTCACGCCGGGCAAGAG 2760 
LFMTLLGAFQVLLARHAGQE 

27 61 GACATCGTGGTCGGCGTGCCGGCGGCGGGGCGCACCCGGACCGAGACGGAACCTCTGGTC 2 82 0 

DIVVGVPAAGRTRTETEPLV 

2821 GGCTTCTTCGTCAACACGCTTCCCTTGCGGGCGATCTGCGCTCCGGGCCTGTCGTTCCGG 2880 
GFFVNTLPLRAICAPGLSFR 

28 81 GACCTGCTGGACCAGGTGCGCGAGGCCGCCCTCGGCGCCTTCGCCCATCAGGACCTCCCC 2 94 0 

DLLDQVREAALGAFAHQDLP 

2941 TTCGAGGCGCTGGTCGAGGCGCTCGCACCCGAGCGCGACCTCGGCCACAATCCCCTCGTC 3000 
FEALVEALAPERDLGHNPLV 

30 01 CAGGTCACCTTCCAGCTCCTGGGCACACCGGCGGCGCGGCCGGACCTGATCGGGACGGAG 3060 
QVTFQLLGTPAARPDL IGTE 

3061 GTCGAGCGGTACCCGGTCCAGGAGGCCGTCTCGCAGTTCGACCTGTCCCTGGACATCAAG 3120 
VERYPVQEAVSQFDLSLDIK 

3121 CGGGCCGACGACGGTTCCTACCGGGGGATCCTGAACTACTGCCCCGACCTGTTCGACCGA 3180 
RADDGSYRGILNYCPDLFDR 

3181 CGCCGCATGGAGGTGCTGGTCGGCCACTACCTGACGCTGCTCGGCGCCGCCGCCGCGGAC 3 24 0 
RRMEVLVGHYLTLLGAAAAD 

3241 CCGGGCCGCCCGATCGGTGAGCTGCCGCTGTCCGACGGGGCCGAACGGCTGCGGCTGCTC 3300 
PGRPIGELPLSDGAERLRLL 

33 01 GACGGGTTCGGGAAGCGGGACGCGGCGTACGCCGGGCCGGGAAGCGTTCCGGAGCGGTTC 3360 
DGFGKRDAAYAGPGSVPERF 

33 61 GCGGAGGTGGCGCGGACGGCACCGGACGCGCGGGCGGTGACGTGTGGCGCGACAACGCTC 3420 

AEVARTAPDARAVTCGATTL 

34 21 ACCTTCGCCGAGCTGAACGACCGGGTGGAGCGCCTGGCACAGGCACTGCTCGGCGCCGGG 3 48 0 

TFAELNDRVERLAQALLGAG 

34 81 GTCACCCGCGAGACGCCGGTCGCGGTCCGCCTGCCCCGTTCCACCGACAGCGTCGTCGCC 3 54 0 
VTRETPVAVRLPRSTDSVVA 

3541 CTGCTGGCCGTCATGCGGGCGGGCGGCGTCTACGTCCCCCTGGACCCCGACTGGCCCGCG 3 600 
LLAVMRAGGVYVPLDPDWPA 

3601 GACCGCACCGCCTACATCCTGGACGACACCGCGGCCTCCGTCGTCATCACCCGCGACCTG 3 660 
DRTAYILDDTAASVVITRDL 

3 661 CCCGCACTCCCCGGTCGCCTCCACGTCGACCCGCGCCGGCCCGCGGCCGACGGCCTGGTA 3 720 
PALPGRLHVDPRRPAADGLV 

3721 CCCGCGCCCCGCATCGACCCCGATCAGGCCGCGTACGTCATCTACACGTCCGGCTCGACG 3 780 
PAPRIDPDQAAYVIYTSGST 

3781 GGCGCGCCGAAGGGCGTCGTCGTCCGGCACCGCTCCCTGAACCACCTCACCAGCGCCCTG 3 840 
GAPKGVVVRHRSLNHLTSAL 
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3841 CAGGCCACCTTTCTCGGCCACGACCCGTATCTCGCCGGGGCCGACGGCGTACCGCCCGGG 3 900 
QATFLGHDPYLAGADGVPPG 



3 901 GACGCGAAGCTGCGTACGACGCTCACCGCGCCCTTCACGTTCGACGCGTCCATGGAGCAA 3 9 6 0 
DAKLRTTLTAPFTFDASMEQ 



3961 CTGAGCTGGATGCTGGCCGGTCACGAGCTGTTCATCGTGCCCGAGGACGTGCGGCGCGAC 4 020 
LSWMLAGHELFIVPEDVRRD 



4021 CCCTCGGCGCTGGTCCGGTTCGTCCGGGAGCACCGGATCGACGTCATCGACACGACCTCC 4 080 
PSALVRFVREHRIDVIDTTS 



4 081 TCGCAGCTCGAACTCCTCGTATCGCACGGGCTGTTGGACGGAGAGTGGGCGCCGTCCATG 4140 
SQLELLVSHGLLDGEWAPSM 



4141 GTCATGGTGGGTGGCGAGGCGGTCTCGCCGTCGCTGTGGCGGACCTTGCGGGACCAGCGG 4 2 00 
VMVGGEAVSPSLWRTLRDQR 



42 01 CGCACTCGCTGTTTCAACCTGTACGGGCCTACGGAGGCGACGGTCGACGCCACCTGCCAC 4 260 
RTRCFNLYGPTEATVDATCH 



4261 GACCTGTCCGACCCCGCCGACGTCCCCGTCATCGGCACCCCACTCCCCCACACCCACGTC 4320 
DLSDPADVPVIGTPLPHTHV 



4321 CGCGTGCTCGACGACCGACTGCGACCCGTACCCGTGGGCGTCGCCGGCGAGATCTACCTC 4 380 
RVLDDRLRPVPVGVAGEIYL 



43 81 GGCGGAACCGGCCTGGCCCGCGGCTACCTCAACCGCCCCGCCCTCACCGCCCGACGCTTC 444 0 
GGTGLARGYLNRPALTARRF 



4441 GTCGCCGACCCCTACCCCGACACCCCCGGCAGCCGCCTGTACCGCACCGGCGACCGCGCC 4 5 00 
VADPYPDTPGSRLYRTGDRA 



4501 CGCTGGCGCCCCGACGGCACCCTCGAATACCTGGGACGCACCGACGACCAAATCAAGATC 4 56 0 
RWRPDGTLEYLGRTDDQIKI 



45 61 CGCGGCTTCCGCGTCGAACCCGGCGAAATCGAGGCCGTCCTCACCCACCACCCCGCCGTC 4 62 0 
RGFRVE PGE I EAVLTHH PAV 

4 621 AAGGAAGCCGCCGTCGTCGACGACGCGCACGCGCGGCTGGTCGCCTACGTCACGCTCGCG 4 68 0 
KEAAVVDDAHARLVAYVTLA 



4681 GAAGGCGGCGGCGCCGGCCCCACCGACGTACGCCGGTTCGCGCAGGGGCGGCTGCCCGCC 474 0 
EGGGAGPTDVRRFAQGRLPA 



4741 CACATGGTGCCGTCGGCGGTGGTCGTCCTGGAGGCGCTGCCACTGACGTCGAACGGAAAG 4 80 0 
HMVPSAVVVLEALPLTSNGK 



48 01 CTGGACCGCGCGCGCCTGCCGGCGCCCGCGGCGGGCAGACCGGAACTGGATGTCCGCTTC 4 86 0 
LDRARLPAPAAGRPELDVRF 

4 8 61 GTGGCGCCGCGCGACATGGTGGAGGAGGTCGTGGCGCAGGTCTGGTGCGCCGTGCTGGGC 4 92 0 
VAPRDMVEEVVAQVWCAVLG 

4 921 GTCGACCGGGTCGGTGTGCACGACGACTTCTTCGAGCTGGGCGGGCACTCGTTGCTGGTG 4980 
VDRVGVHDDFFELGGHSLLV 



5 041 TTCGACGCCGCGACGGTCGAGGAGCTCGCCGCCCGCGTCCGCGCCGCACGGACCGAGGGC 5100 
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FDAATVEELAARVRAARTEG 

5101 CTCGGCCGGGGGGCCGCCCCGCCCCTCGGGCCGGTGGACCGGAGCGGGCCGCTGCCGCTG 5160 
LGRGAAPPLGPVDRSGPLPL 

5161 TCGTTCGCGCAGCAACGCCTTTGGTACCTCGATCAGTTGGCGCCCGACAGTGTCTCCTAC 5220 
SFAQQRLWYLDQLAPDSVSY 

5221 AACATGTGCGACGCCTACCGGGTCCGCGGCCCTCTCGACCTGGACGCGCTGCGGCGGGCG 52 80 
NMCDAYRVRGPLDLDALRRA 

5281 CTGCGGACGCTGGTCGAGCGGCACGAGACGCTGCGGACGGCGTTCGTCGAGCGGGACGGG 5340 
LRTLVERHETLRTAFVERDG 

5341 GTGCCCCACCAGGTGGTCTCGGCGCCCGACGCGCCGGCCGCGCGGCGCGCGGCGGAGGTC 5400 
VPHQVVSAPDAPAARRAAEV 

5401 GTGCGGATCGAGGCGGCCGGGCGGACCGACGAGGCGGTGCGGGACCTGGTGGCCGCGGAG 54 60 
VRIEAAGRTDEAVRDLVAAE 

5461 GCGCGCACCCCGTTCCGGCCGGCGGACGGCGCGCTGATGCGCGTGGCGGTGGCCCGGCTG 5520 
ARTPFRPADGALMRVAVARL 

5521 GCGGACGACGATCACGTGCTGGTGGTCACCACGCACCACATCGTCTCCGACGGCTGGTCG 5580 
ADDDHVLVVTTHHIVSDGWS 

5581 GTCGACATCCTGGTGGACGAATTGGGGCGCCTGTACCGGGAACACGTCACGGGTGACCCC 56 4 0 
VDILVDELGRLYREHVTGDP 

5641 GCCGGGCTCCCTCCGCTCGACGTCCAGTACGCCGACTTCGCCGTCTGGCAGCGGTCCTGG 57 00 
AGLPPLDVQYADFAVWQRSW 

57 01 ATGACCGGCCCCGTGCGGGAGGAGCACCTCGCGTACTGGAAGCGGGCCCTGGACGGGGCA 5760 
MTGPVREEHLAYWKRALDGA 

5761 CCCTCGGTCCTGCGGCTGCCGGCGGACCATCCGCGTCCCGCCGTCGAGTCCCAGCGGGGC 582 0 
PSVLRLPADHPRPAVQSQRG 

5821 GAGACCGTCGAGTTCCCCCTGCCCGCACCACTGGTCGCGCGGCTGGAAGCGCTCTGCCGG 58 80 
ETVEFPLPAPLVARLEALCR 

5881 GAGCAGGGCGTCACCCTGTTCATGGCGCTCTTCGGCGCGTTCCAGGTGTTGCTGGCGCGC 5 94 0 
EQGVTLFMALFGAFQVLLAR 

5941 TACAGCGGTCAGGACGACGTGGTCGTGGGCGTGCCGACGGCGAACCGCACCCGCGCGGAG 6 000 
YSGQDDVVVGVPTANRTRAE 

6001 ACCGAGCCCCTGGTCGGCTTCTTCGTCAACACCCTTCCGGTACGGGTCGCGTGCTCGCCG 6 060 
TEPLVGFFVNTLPVRVACSP 

6061 GAGCTGTCGTTCCGCGCCCTGCTCGACCGGGTCCGCGAGGCCGCGCTGGGCGCCTTCGCC 612 0 
ELS FRALLDRVREAALGAFA 

6121 CATCAGGACCTGCCCTTCGAGGCGCTGGTCGAGGCGCTCGCGCCCGAGCGCGACCTGGGC 618 0 
HQDLPFEALVEALAPERDLG 

6181 CACCACCCTCTCGTGCAGGTCACCTTCCAACTCCTCGACGCTCCCGACGAGAGGCTCGTC 6240 
HHPLVQVTFQLLDAPDERLV 

6241 CTGCACGGCACGGACTGCGTCTCGCTCGGCTTCGGCGGTGTGACCAGCCGGTTCGACCTG 63 00 
LHGTDCVSLGFGGVTSRFDL 
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6301 TCCCTCGACGTCGTCTCGGGGCGGCGGGGGAAGCGGTGCGTGCTGACGTACTGTCCCGAC 63 60 
SLDVVSGRRGKRCVLTYCPD 



63 61 CTGTTCGACCGGCCCCGCATGGAGGTGCTGGCCGGCCACTACCTGACCCTGCTCGGCGCG 64 2 0 
LFDRPRMEVLAGHYLTLLGA 



6421 GCGGCCGACGATCCCGGTCTCCGCGTCGGCGACCTCCCGCTGAGCGACGACGTCGAACGC 64 80 
AADDPGLRVGDLPLSDDVER 



6481 CTGCGCCTGCTGGGCGGGTCCCGCCCGCGGTACCTGCCCGCGCCCGGGGCGGAGACCGTC 65 4 0 
LRLLGGSRPRYLPAPGAETV 

6541 CCTGACGCCTTCGCCGCGCAGGTGCGGGCGACACCGGACGCGCCCGCGCTGGTCCACGGG 66 00 
PDAFAAQVRATPDAPALVHG 



6601 GACTCGACGCTGACGTTCGCCGAGCTGGACACCCGGGTCACCGCCCTGGCCGTGCGGTTG 6660 
DSTLTFAELDTRVTALAVRL 



6661 CGGCGCTGCGGCGTGGCCGCCGAGACGCCGGTCGCGGTGTGCCTGCCGCGCTCCGCCGAC 6720 
RRCGVAAETPVAVCLPRSAD 



6721 GCCGTCGTGGCCCTCCTGGCCGTCCTGCGGGCGGGCGGCGTCTATGTGCCAGTGGATCCG 6780 
AVVALLAVLRAGGVYVPVDP 



6781 GAGTGGCCCTCCGGCCGCGTCGCCCACGTCCTCGACGAGACCGCGGCCCCCGTCGTCATC 684 0 
EWPSGRVAHVLDETAAPVVI 



6841 ACCCGCGACCTGCCCGCCGATCCCGGCCGCGTCCACCTCGACCCGCGCCAGGCCCCGGCC 6 900 
TRDLPADPGRVHLDPRQAPA 



6 901 GACGACCGGGATCCCCTGCCGCGCCTCCACCGCGACCAGGCCGCGTACATCATCTTCACC 69 60 
DDRDPLPRLHRDQAAYI I FT 



6961 TCGGGCTCCACCGGCGCCCCCAAGGGCGTCGTCGTCCGACACGGCTCCCTGTACCACCTC 7 02 0 
SGSTGAPKGVVVRHGSLYHL 



7021 CTGGGCCACGTACGGCGCATGGCGGAGGGCGGCCCCCGGCGGAACGTCGCGCACACCACC 7 080 
LGHVRRMAEGGPRRNVAHTT 



7081 GCGATGACCTTCGACCCGTCGCTGGAACAGTTCCTGTGGCTCGTCGCCGGACACACCCTG 714 0 
AMTFDPSLEQFLWLVAGHTL 

7141 CACGTCGCGCCCGAGGAGGTGCGCCGCGATCCCGAGGCGCTGGTGGCCCTGGTGCGGCGC 72 0 0 
HVAPEEVRRDPEALVALVRR 



7201 GCCGCGATCGACGTCCTCAACGTCACCCCGTCCCACCTGACCCTGCTGATCGAGGCCGGG 7260 
AAIDVLNVTPSHLTLLIEAG 



72 61 CTGCTGGAGGGCGACCGGGTGCCGGGTACGGTCCTGGTGGGTGGCGAGGCGGTGCCCGCG 73 2 0 
LLEGDRVPGTVLVGGEAVPA 



7321 GCGCTGTGGCGGACCCTGCGCGAACGGACGGGAGCCACCCGCTTCTTCAACCTGTACGGG 73 8 0 
ALWRTLRERTGATRFFNLYG 



7381 CCTACGGAGGCGACGGTCGACGCCACCTGCCACGACCTGTCCGACCCCGCCGACGTCCCC 744 0 
PTEATVDATCHDLSDPADVP 



7441 GTCATCGGCACCCCACTCCCCCACACCCACGTCCGCGTGCTCGACGACCGACTGCGACCC 7500 
VIGTPLPHTHVRVLDDRLRP 



7501 GTACCCGTGGGCGTCGCCGGCGAAATCTACCTCGGCGGAACCGGCCTGGCCCGCGGCTAC 7560 
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VPVGVAGEIYLGGTGLARGY 

7561 CTCAACCGCCCCGCCCTCACCGCCCAACGCTTCGTCGCCGACCCCTACCCCGACACCCCC 7620 
LNRPALTAQRFVADPYPDTP 

7621 GGCAGCCGCCTGTACCGCACCGGCGACCGCGCCCGCTGGCGCCCCGACGGCACCCTCGAA 7680 
GSRLYRTGDRARWRPDGTLE 

7681 TACCTGGGACGCACCGACGACCAAATCAAGATCCGCGGCTTCCGCGTCGAACCCGGCGAG 7740 
YLGRTDDQIKIRGFRVEPGE 

774 1 ATCGAAGCCGTCCTCACCCACCACCCCGCCGTCAAGGAAGCCGCCGTCACCGTGGCCACC 78 00 
IEAVLTHHPAVKEAAVTVAT 

78 01 GACGACGGTGCCGCCCGGCTGGTCGCCCTCGTCGTCCCCGCCCCCCGCGCCCCGCACGGC 78 60 
DDGAARLVALVVPAPRAPHG 

7861 GATTCGGCCGACGGCGCCCCGGACGCCCAGGTCGAGGAGTGGAACGCCGTCTTCGAGGCG 7920 
DSADGAPDAQVEEWNAVFEA 

7921 ACCCACACCGACGCCGCCGACGGCGAACTCACCTTCAACATCAAGGGCTGGAACGACAGC 7980 
THTDAADGELTFNI KGWNDS 

7981 CTCACCGGTGCGCCGATCCCCGCCGAACACATGCGGGAATGGGTCGACACCACCGTCGCC 8040 
LTGAPIPAEHMREWVDTTVA 

8041 CGGCTCCTGGAACGGCCGGCCGAGCGCGTCCTGGAGATCGGCAGTGGCACCGGGCTGCTG 810 0 
RLLERPAERVLEIGSGTGLL 

8101 ATGTGGCGGCTGCTGCCGCACGTCACCGAGTACACCGGAACCGACTTCTCGCGGCCCGCC 8160 
MWRLLPHVTEYTGTDFSRPA 

8161 GTGGACTGGCTCCGGGACGGGCTGCGCCGCCGCCCCGCGCACCGGGTACGGCTGCTGCAC 8220 
VDWLRDGLRRRPAHRVRLLH 

REATDFTGVRAASTDLVVVN 

82 81 TCGGTCGTCCAGTACTTCCCCGACCGCGCCTACCTCGACACCGTCCTGGCCCGCGCCCTC 834 0 
SVVQYFPDRAYLDTVLARAL 

8341 GACGCCACGGCCGACCGAGGGCGCGTCTTCGTGGGCGACGTGCGCAACCTGGCCCTCGCC 84 00 
DATADRGRVFVGDVRNLALA 

84 01 CCGCAGTTCTACGCCCGTCAGGCCCTCGCCCACGCCGGTCCGGGCGCGGCGGCGCGGGAC 84 60 
PQFYARQALAHAGPGAAARD 

84 61 GTGGCGCGCGCCGCCGGCGAGTTCGCGGCCATGGACGGCGAGTTGCTGGTGTCCCCCGCG 8 5 2 0 
VARAAGEFAAMDGELLVSPA 

8521 TACTTCGCCGCGCTCGCCGCCCGCTCGCCCCGCGTCACCGGCGTCGAGATCCTGCCCCGC 8580 
YFAALAARSPRVTGVEILPR 

8581 CGGGGACGGCACCGCAACGAGATGAGCCTGTACCGCTACGACGTGGTGCTGCACGTGGGC 8 64 0 
RGRHRNEMSLYRYDVVLHVG 

8641 GGTGACCGCCCGGCGGCCCCGGAGGCGGAGGTGCTCACCTGGGGCGACCAGGTGCACGAC 8700 
GDRPAAPEAEVLTWGDQVHD 

8701 CTCGCGTCGCTGTCCGCCCGCCTCGGCCGCGGGGGCCCGGACGCCCTGCTCGTGCGCGGC 8 7 60 
LASLSARLGRGGPDALLVRG 



47 




87 61 GTCGCCAACGACCGTCTGACGCGGGACAACGAGCTGCTCGACGCACCCGCCCGCACGACG 882 0 
VANDRLTRDNELLDAPARTT 



8 821 GCCGTCGAGCCCGAGGACCTGTGGGGGCTGGCGGACTCCACCCCCTACCGGGTGAGCGTC 8 8 80 
AVEPEDLWGLADSTPYRVSV 



88 81 AGCTGGGCCGCCGCCGATCCGCGGGGCGCGATGGACGTCCTGCTGGTCCGGCGGGACGCC 8 94 0 
SWAAAD PRGAMDVLLVRRDA 



! 94 1 CACGACGACGGTCCGCTGCTCGTCCCCCACCCCGTACCGGAGCCCTCGGCACCGCTGACG 9000 
HDDGPLLVPHPVPEPSAPLT 



90 01 AACACGCCGACCCGGCACCCGTCCGCGCGGCAAGGGGGCTCGGCCGCGGACGGGCTGCGT 9 0 60 
NTPTRHPSARQGGSAADGLR 



9061 TCCTGGCTCGCCGAGCGGCTTCCCGCGCACCTGCTGCCCGCGAGGATCACCGAGGTGGAC 9120 
SWLAERLPAHLLPARITEVD 



9121 GCGCTGCCCCGCACCGGCACCGGCAAGCTCGACCGGGGCGCGCTCGGCGGACTCGTGACC 9180 
ALPRTGTGKLDRGALGGLVT 



9181 GCGGGCCGTGGCGCCCGGGCGGGCGACCGCCCCGCCACCGCCCCCCGTACGGGTCTCGAA 924 0 
AGRGARAGDRPATAPRTGLE 



9241 CGGACCCTGGCCGACGCGTGGGCGCGGGTGCTCGGCCTCCCCGAAGTCGGCGTGCACGAG 93 00 
RTLADAWARVLGLPEVGVHE 



NFFALGGDSLLAVRAVARCR 



93 61 CGTGCCGGGGTCCGACTGACCGTCCGGCAGTTGCTGAGCGAGCAGACCGTCGCCGCGCTC 9420 
RAGVRLTVRQLLSEQTVAAL 



9421 GCGGCGGCCCTCGAGGAGGAGTCTCAATGATGAAGTCAAGCCGCTTGCGCGACCGGCAGC 94 8 0 
AAALEEESQ* 

MMKSSRLRDRQL 
(orf33) 

9481 TCGGGGGTGAAGACCCGGTTGTCGCGCAGGAGAGCCCACAGGACGCTGGCCCGACGCCGT 954 0 
GGEDPVVAQESPQDAGPTPC 



9541 GCCAGGGCGATGACGGCTTGAACGTGTTTGCAGCCCTCGCCGCGCTTCTTGAGGTAGAAG 96 00 
QGDDGLNVFAALAALLEVEV 



9601 TCCCGGTTCGGCCCCTCCCGCATCATGCTGGTTTGGGCCGACATGTAGAACACTCGTCGC 9660 
PVRPLPHHAGLGRHVEHSSQ 



9661 AGGCGGCGGCTGTAGCGCTTGGGCCGATGCAGGTTGCCAGTGCGACGACCGGAGTCGCGG 9720 
AAAVALGPMQVASATTGVAG 



9721 GGGACGGGCACCAGGCCGGCCGCCGAGGCCAGGTGACCGGCGTCGGCGTAGGCCGTGAGG 9780 
DGHQAGRRGQVTGVGVGREV 



9781 TCGCCGGCGGCGACGACGAACTCGGCGCCGAGGATCGGCCCCATGCCCGGCAGAGACTCG 984 0 
AGGDDELGAEDRPHARQRLD 

9841 ATGATCTCGGCCTGTGGATGGCTGCGGAACGTCTCGCGGATCTGCTGGTCAATCCGCTTC 9900 
DLGLWMAAERLADLLVNPLQ 

9 901 AGACGGTCGTCCAGGGCCAGGATCTGCGCGGCCAGGTCAGCCACGATCTGGGCGGCGACG 9960 
TVVQGQDLRGQVSHDLGGDV 
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9961 TCCTCCCCGGGCAGCGCGGTCTGCTGAGCCTGGGCAGCCTCCAGCGCCGTCGCGGCGACG 1002 0 
LPGQRGLLSLGSLQRRRGDG 

10021 GCGTCGGCACCGCGCACGCCTCGGTTGGCCAGCCAGGCCGTCAGCCGGGCCCGGCCGCGG 1008 0 
VGTAHASVGQPGRQPGPAAA 

10081 CGGCGGAGAGCTGCCGGGGTCTGGTAGCCCGTCAGCAGGACCAGCGCGCCCTTCTGCGAG 1014 0 
AESCRGLVARQQDQRALLRA 

10141 CTGTAGTCGAAGGCCCGTTCCAGCGCGGGGAAGACGCCGGTCAGCGTGTCGCGGAGACGG 10200 
VVEGPFQRGEDAGQRVAETV 

102 01 TTGATCATCCTGACCCGGTCGGCCACGAGGTCGGAACGGTGGGCGGTCAGCAGCGCGAGG 102 6 0 
DHPDPVGHEVGTVGGQQREV 



10261 TCGGCGGCCAGCTGGGCGGGCACGTCGATCGACGCGAAGTCCCGTCGGTTGCGGGCGGTT 10320 
GGQLGGHVDRREVPSVAGGF 



10321 TCGGCGATGACGTAGGCGTCGCGGGCGTCGGTCTTCGCCTCGCCCCGGTAAGCGCCGGAC 10380 
GDDVGVAGVGLRLAPVSAGH 



10381 ATGCGGTTGACCGTGCGGCCGGGCACGTAGACGGCCTGCTGGCCGTGGGCCGCGAGCAGG 10440 
AVDRAAGHVDGLLAVGREQG 



10441 GCCAGCAGCAGCGCGGAGGACGTGCCGGAGATGTCCACTGCCCAGTGGACCTCGTCGGCC 10500 
QQQRGGRAGDVHCPVDLVGQ 



10501 AGGTCGAGGATCTCACCCATGGCGGTCAGGATCGCCGACTCATCGTTGCCGATCTTCTTC 10560 
VEDLTHGGQDRRLIVADLLR 



10561 GACCACAGCGTCACACCGGTCTCGTCGACCACCGCCGCCCAGTGATGCCCCTTGCCCGCG 10 62 0 
PQRHTGLVDHRRPVMPLARV 



10621 TCGATCCCGGCCCAGACCCGGGCCCGTCGCTCGCCCACTCGCCCCTCCTCACTCCGAACA 10 680 
DPGPDPGPSLAHSPLLTPNS 



10681 GCATCCCGTCGACCCGAGGAACACCCCGCTGTCATCTCCGTAAAAAGCGACCGAAGCGCA 10 740 
I PSTRGTPRCHLRKKRPKRT 



10741 CATCTCAATCAGCAGCCAGGGCGCCCCGGAGAACCGGGCGGCCACTCCTTGTAAGCCACT 10800 
SQSAARAPRRTGRPLLVSH* 



10801 GACGGCAGAGAACCATAAGCCACACCCGGCCCTCCCGGGCCGCCTAACAACTTACGGAGA 10 860 



10861 ACCATGACTGACCTGCCGTTGCGTACCGTCGCACTCACCGGTGAGGAGAGCGCGGAGGTC 10 92 0 
MTDLPLRTVALTGEESAEV 
(orf34) 

10921 GACGACCTGCTGCGCACGCTGGCCGACGTGCCGGTCGACTCCACCGTGGGACTGCTGCAC 10 980 
DDLLRTLADVPVDSTVGLLH 



10981 CGCACCCGGCTCGCCGCACAGGAACTGCCGCTGCGCATCCGCGCCGAGCTCACGGGGATG 1104 0 
RTRLAAQELPLRIRAELTGM 



11041 CGGCTCTACGACAGCCCGCGCGCCCTGGTCGTCACGGGCTTCGGCGTCGACGACGAACGG 1110 0 
RLYDSPRALVVTGFGVDDER 



11101 ATCGGACCGACCCCCGCGGCCCGTCCCGCCCCGGATCCCGAGCGGACCCGCGACCTGGAG 11160 
IGPTPAARPAPDPERTRDLE 



11161 CTGCTGCTTTTGCTGCACGCGGCCCTGCTCGGCGAGGCGTTCGGCTGGGCGACCCAGCAG 1122 0 
LLLLLHAALLGEAFGWATQQ 
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112 21 AACGGCCGGCTCGTCCACGACGTGCTGCCCGTTCCCGGTGAGGAGACCGCGCAGATGGGT 112 8 0 
NGRLVHDVLPVPGEETAQMG 



11281 TCCAGCAGCGAGACCGAGCTGCTGTGGCACACCGAGGACGCGTTCCACCCGCTGCGCTGC 
SSSETELLWHTEDAFHPLRC 



11341 GACTACGTGGGCCTGCTGTGCCTGCGCAACCACCAGCGCGCCGCGACCACCGTGGGCTGG 11400 
DYVGLLCLRNHQRAATTVGW 



114 01 CCCGACCTGTCCCGGCTCACCACCGAGGACCGTGCCGTGCTCCTCGAACCCCGCTATCTG 1 14 £ 0 
PDLSRLTTEDRAVLLEPRYL 



11461 ATCCGCCCGGACACCTCGCACACGCCCGCGCAGAACGCGACGGGCACGCGGTCCGCCGAG 11520 
I RPDTSHTPAQNATGTRSAE 



11521 CGTTTCGCGGCGATCGCCGAGATGGACGACGCCCCGGAGCGCGTCGCCGTCCTGTTCGGC 1158 0 
RFAAIAEMDDAPERVAVLFG 

11581 GACCCCGAGGACCCGTACCTGCGGATCGACCCGGCCTACATGAGCCCGGCCCCCGGGGAC 11640 
DPEDPYLRIDPAYMSPAPGD 



11641 GCGGCCGCCCGGCGGGCGTACGACACCGTCACCGCGCTCATCGAGGACGAGCTGCGGCAC 117 0 0 
AAARRAYDTVTALIEDELRH 



11701 GTCGTCCTGGACGCCGGTTCACTGCTGCTGGTCGACAACTACCAGGCGGTGCACGGCCGC 11760 
VVLDAGSLLLVDNYQAVHGR 



11761 AAGCCGTTCGCCGCCGCCTACGACGGCCGCGACCGCTGGCTCAAACGCGTCAACATCACC 11820 
KPFAAAYDGRDRWLKRVNIT 



11821 CGCGACCTGCGCCGTTCCCGGTCCGCGCGGCGGTCGGCCACCTCGCTGCTGGTGTGAGGG 11880 
RDLRRSRSARRSATSLLV* 



118 81 AGGCACCATGGATTTCCCCCTCACCCGCGTCAACCCCTGGTTCAGCGGCGGCTGCGACGG 11940 
MDFPLTRVNPWFSGGCDG 
(orf35) 

11941 CCGCCCCCGGGTGCGGCTGTGCGCGCTGCCGTACGCGGGCGGCACCGCCGCCGTCTTCAA 12000 
RPRVRLCALPYAGGTAAVFK 



12001 GGACTGGCCCGCCGCGCTGCCCCCCGGAGTGGAGCTGCTCACCGCGCACCTGCCGGGACG 12 050 
DWPAALPPGVELLTAHLPGR 



12061 CGGCGACCGGTTCACCGAACCGCCCCCGGCCACCCTGGAGGAGACCGCCGAGCGGCTGTG 12120 
GDRFTEPPPATLEETAERLC 



12121 CGAGGCGCTGCCGCCGAGTGACCTGCCCACGGTCGTCCTCGGCCACAGCATGGGCGCCCT 12180 
EALPPSDLPTVVLGHSMGAL 



12181 GCTGGGGTACGAAGTGGCGGCGCGGCTCGCGGCGCGGGGCCGCGCCCCCAACCTGCTGAT 12 24 0 
LGYEVAARLAARGRAPNLLI 



12241 CGCCGCGGCCTGCCGTCCCCCGCACGTTCCGCCGGACGCCTCCGGTCCGGTGACCGAGGC 123 00 
AAACRPPHVPPDASGPVTEA 



123 01 CGAGCTGGCGGCCACCCTGCGGGCCGAACGCCCATGGGACACGGCCCTGAGGGACGAGGA 12360 
ELAATLRAERPWDTALRDEE 



12361 ACTGATGGAAGCGGTGCTGCCCGCCCTGGTCGCCGACATCACGGCCGGCGACCGCTACCA 1242 0 
LMEAVLPALVADITAGDRYH 



12421 CCGCCCGCGGCCCCGCCCGCTCGACCTCCCGCTGAAGGTCTACATCGGCGCCGACGACGA 124 80 
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RPRPRPLDLPLKVYIGADDD 



124 81 CGGCACCGACTGGCGCACCACCCTGGGCTGGCGCGCGTGCACCGCCCGGGACTGCGAGGT 1254 0 
GTDWRTTLGWRACTARDCEV 



12 541 CGTCGTCCTGCCCGGCGGCCACTACTTCCTGGAGACCGACCGCGCGGCCGTCCTCACCCG 12600 
VVLPGGHYFLETDRAAVLTR 

12 601 CGTCGCCACGGACCTCGCCGAAGCCGAGGTAGGGGCATGACCGCGCGCGTCGACGCCACA 12 6 6 0 
VATDLAEAEVGA* 

MTARVDAT 
(orf 36) 

12 661 CCCACCTACCTGGCGGTGCTGGCGGTGCGCGAGGCCCGCGCCCCGCTCCTCGGCAGCTGC 12 720 
PTYLAVLAVREARAPLLGSC 



12 721 CTGGCCCGCATGTCCTTCGCGGTGCTGCCGCTCGCCCTGCTGCTGTCGGTCCGGGACGCG 127 80 
LARMSFAVLPLALLLSVRDA 



12781 ACGGGGTCGTTCGCCGTCGCCGGACTGACCTCCGGCGCGCTGTCGGCCACGCTCACGCTG 12 84 0 
TGSFAVAGLTSGALSATLTL 



12 841 TTCGCGCCCGCCCGCGCCCGGCTGATCGACCGCCGGGGCTCACGGTCCGGACTGGTCCGG 12900 
FAPARARLIDRRGSRSGLVR 



12 901 CTGACCGTCCCGTACCTGCTGGGGCTCGCCGTGCTGATCACATTGGCCGAGGCGGAAGCG 12 960 
LTVPYLLGLAVLITLAEAEA 



PTAALLVAAAVAGVFAPPLG 



13 021 CCGACCATGCGCGTGCTGTGGGCGAGGATCCTGCACGGCCGTCAGCCCCTGCTGCACACC 13 0 80 
PTMRVLWARILHGRQPLLHT 



13 0 81 GCCTACGCCCTCGACTCCGTCACCGAGGAGGTGGTCTTCACCGTGGGGCCGCTGCTGGCG 13140 
AYALDSVTEEVVFTVGPLLA 



13141 GGCGGCCTGATCGCGGTCGCGGCACCGCTCGCGTCGATGATCACGGTCATGGTGCTGATC 13 2 00 
GGL I AVAA PLAS M I TVMVL I 



132 01 GCGGCCGGTACCGCCTGCTTCGTGCTGTCCGCCGCGACCGCCGCCGCCCCCGCGTCGGGC 132 60 
AAGTACFVLSAATAAAPASG 



13261 GAAGCCGACGAGGACCGGCCGCACGGCCGGCCCATGGCTCTGCCCGGGATGCGCACGATC 13320 
EADEDRPHGRPMALPGMRTI 



13 321 GTGCTGTCCTTCGGCGGCGTCGGCCTGGTCGTCGGGGTGCTCCAGGTCGTCCTGCCGTTC 133 80 
VLSFGGVGLVVGVLQVVLPF 

13381 ATCGCCGACCACGCGGGCTCGCCCGGCGCGGGCGGCATCCTGCTGTCCATGCTGTCGGCG 1344 0 
IADHAGSPGAGGILLSMLSA 



13441 GGCAGCGCGGTCGGCGGCCTCGCCTACGGGCGGATCGCCTGGCGCTCGACGCCCGTGCGG 13500 
GSAVGGLAYGRIAWRSTPVR 



13501 CGGTTCGTGGTGCTCGTCACCGGGTTCACGCTGGCGGTGCTGCCGCTGTGCCTGACCGCG 13560 
RFVVLVTGFTLAVLPLCLTA 



13 561 AGCCCGGTGCCGGCCGGGGCCTTCGCCCTCCTCGTGGGACTCTGCCTCGCCCCGCTGTTC 13 620 
S PVPAGAFALLVGLCLAPLF 



13621 ACCACCGCCTACCTGCTGGTCAACGACCTGGTGACGGCGTCGGGGACCGCACCCACCGAG 136 8 0 
TTAYLLVNDLVTASGTAPTE 
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13 681 GCCAACACCTGGGTCTCCACGGCCAATAACGGAGGGTTCGCCGCGGGCAGCGCCGCCGCC 13740 
ANTWVSTANNGGFAAGSAAA 



13741 GGTGTGCTGCTCGACTCCCGGGGCCCCACCGTCACCGTCACCGCCGCGTTCGCGGTCGCC 13800 
GVLLDSRGPTVTVTAAFAVA 



13801 GCCGCGACCGCCGTCATGACCGTTCTGCGCCGCCGGACCCTGCTCCTCGGCGCCGGACAC 13 860 
AATAVMTVLRRRTLLLGAGH 



13861 CCCGAACCGGCCGCCGCCACACCCGCCGACCGCACCGCACCCGCCGAAGCCGAGGAGTGA 13 920 
PEPAAATPADRTAPAEAEE* 



13921 ACCGATCGTGTCCAAGAACGCGGCGCACTGGTCGCGCATCCGCACAGGGGACGCCCCCGG 13 980 
MSKNAAHWSRIRTGDAPG 
(orf37) 

13 981 CGTCGTACTCGCCGTGGACTTCTACGGAACGGGCCGCCAGGAAGCCACCTTCCGCCACCT 14 040 
VVLAVD FYGTGRQEATFRHL 



14 041 GTGCGACCTGCTCACGGATCCGGTCGAGGTCTGGCACGCGGTCCCGCCCGCCCCGGACGG 1410 0 
CDLLTDPVEVWHAVPPAPDG 



14101 CGACTGGTCCACGGCCACCGGCGCCGGTCACCTGCGCTGGTGGACCGAGGGGCTCGACAC 14160 
DWSTATGAGHLRWWTEGLDT 



14 161 GGTCCTCGCGGGACGGCCGGTGCGGGCCCTCGTCGGCTACTGCGCGGGCGGCGTCTTCGC 14220 
VLAGRPVRALVGYCAGGVFA 



14221 CTCGGCCCTCGCCGACGCCCTCGTCGAACGGGAGGGCCACCGGCCGCGGGTCGTGCTGTT 14 2 8 0 
SALADALVEREGHRPRVVLF 



14281 CAACCCCAGCGCGCCCGGCGTCGCCACGCTCACCCGCGACTTCCGCGGTCTGATCGCCGG 1434 0 
NPSAPGVATLTRDFRGLIAG 



14341 CATGGACCTCCTCACGGACGGGGAACGCGCCGCTCTGCTGGCCGAGACGACCGCGATCCG 14400 
MDLLTDGERAALLAETTAIR 



144 01 GCGGGCACACGCCCCCGACGCGCTGGTACCGGTCGCCGAACGCTACGCCGCCCTGTACCG 144 60 
RAHAPDALVPVAERYAALYR 



144 61 CGAGGGCTGCGACCTCCTGTGCGAGCGGCTCGGCGTGGACGCCTCCTTCGGCGCCGAACT 14 520 
EGCDLLCERLGVDASFGAEL 



14 521 GGCCGCCGTCCTCCACTCCTACCTGGCCTACCTCACGGCGGCGCTCGACGTGCCCCCCAC 14 580 
AAVLHSYLAYLTAALDVPPT 



14 581 CCCGCTGTGGCGCGGCGCCGTCTCGCTCACCTCCCGCGAGCACCAGGGCACCGACTTCAC 14 64 0 
PLWRGAVSLTSREHQGTDFT 



14641 CGACGTCGAGCACGGCTTCGACGTCGCCCGTGCCGAACTGCTGAGCTCCCCCCAGGTCGT 14 700 
DVEHGFDVARAELLS S PQVV 



14 701 CGCGGCGCTGACCGCGCTCCTCCGCGAACACGAGGCGAGCCGATGACCCTCACCCTGCGG 14 76 0 
AALTALLREHEASR* 

M T L T L R 
(orf38) 

14761 GACGCCTTCCTCGACCAGGCCGCCCGGACCCCCGACGCCCACGCCGTCGTACACGGCGAC 14 82 0 
DAFLDQAART PDAHAVVHGD 

14821 ACTGTATGGACGTACCGCGAACTGGAACTGCGGGCCGGCCGCATGGCCCGGACGCTGGCC 14 880 
TVWTYRELELRAGRMARTLA 
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14 881 GCACGCGGCGCGGGCCCCGGCACGCTGGTGGCGGTACGCCTGCCGCGCGGTCCCGAACCG 14 94 0 
ARGAGPGTLVAVRLPRGPEP 



14 941 GTCGCCGCGCTCCTCGCGGTCGTGCTGACGGGAGCGGGCTACGTGCCGCTCGCCGACGAC 15 000 
VAALLAVVLTGAGYVPLADD 



15001 GACCCGCCGGACCGGTGCCGGCACATCCTCGACGACTGCGCCGCCGCGCTGCTGCTGGCC 15 060 
DPPDRCRHILDDCAAALLLA 



15061 GAGCACCCCTCGCGGGACGGACGCACCCTCACCCCGGACGAGGCGCTGGCACCCGCCCGC 1512 0 
EHPSRDGRTLTPDEALAPAR 



15121 CCGTTCGACGCGGGCCCGGTGCGGGCCGGCGACCCGGCGTACGTGATCTACACCTCCGGC 15180 
PFDAAPVRAGDPAYVIYTSG 



15181 TCCAGTGGCCGTCCGAAGGGCGTGCTGGTCGAACAGGGCGCGCTCGGCGCCTACCTGGCA 1524 0 
SSGRPKGVLVEQGALGAYLA 



15241 CAGGCCCGCGCGCGCTACGACGGGCTGTCCGGACGGACGGTGCTGCACTCCTCGCTGTCC 15300 
QARARYDGLSGRTVLHSSLS 



15301 TTCGACATGGCCGTGACCAGTCTGTGGGGCCCGCTCGTCAGCGGCGGCGCGATCCACGTG 15360 
FDMAVTSLWGPLVSGGAIHV 



15361 CTCGACCTGAAGGCGATCGCCTCCGGCACCCAGCCGCCGCCCGCCGCCTCGGCACGTCCG 15 420 
LDLKAIASGTQPPPAASARP 



15421 TCCTTCCTCAAGGTCACTCCGTCCCACCTGCCGCTGCTGGGCCTGCTGCCGGACTCCTGC 15480 
SFLKVTPSHLPLLGLLPDSC 



15481 CTGCCCACCG3GCAACTCGTGATCGGCGGCGAGGCGCTGACCGGCTCCGCGCTCGGACCC 15540 
LPTGQLVIGGEALTGSALGP 



15541 TGGCGCGCCGCGCACCCCGACGTCACGGTCGTCAACGAGTACGGGCCCACCGAGGCGACC 15 60 0 
WRAAHPDVTVVNEYGPTEAT 



15601 GTCGGCTGCTGCGCGTACACCGTCCGCCCCGGTGACGCCGTGGACCCGGGTGCCGTCCCC 15 660 
VGCCAYTVRPGDAVDPGAVP 



15661 ATCGGACGGCCGTTCGCGGGCACCCGCCTGTACGTGCTCGACGCGGACGGCGAGCCGGTC 15 720 
IGRPFAGTRLYVLDADGEPV 



15721 GCCGTGGGCGGTGTGGGTGAACTGCACATCGCGGGCGACCAGTTGGCGCGCGGATACCTG 15 78 0 
AVGGVGELHIAGDQLARGYL 



15781 GGGCGCCCGCGGCTGACCGAGGAACGCTTCGTCCCGGACCCGTTCGCCGCCGACGGCTCC 15 84 0 
GRPRLTEERFVPDPFAADGS 



15841 CGGATGTACCGCACCGGCGACCTGGTGCGCGAACGCCCGGACGGCGACCTGGAGTACCTC 15900 
RMYRTGDLVRERPDGDLEYL 



15901 GGGCGCGCGGACGGGCAGGTGAAGGTCTCCGGGTACCGGATCGAGCCCGGCGAGATCGAG 15960 
GRADGQVKVSGYRIEPGEIE 

15961 GCCGTGCTCCGCGGCCACGCGGGGGTGAGGGACTGCGCGGTCGTCGCCGTCGGCGAGGCG 16 020 
AVLRGHAGVRDCAVVAVGEA 



16021 GACGCCCGCCGGCTCGTCGCCTACGTGGTACCGGACCCGGACTCCCCGCCCGGCACCGCC 16080 
DARRLVAYVVPDPDS PPGTA 



16081 GCGCCGGCGCGGCACGCGGCCGAGGCGCTGCCGCCGTACATGGTGCCGGCGACGTTCGTC 1614 0 
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Y M V P A T F 



16141 ACCGTGCCCGAACTGCCGCTCACCCCCAACGGGAAGCTCGACCGGGACGCGCTGCCCGGC 16200 
TVPELPLTPNGKLDRDALPG 



162 01 CCCCCTGCCGGCGACGCCGGGCCGGGCGACCGCACCCCGGCCGAGACCCTGCTGTGCGAG 16260 
PPAGDAGPGDRTPAETLLCE 



162 61 CTGCTGGCACGGGCCCTGGGCATCCCGGAGATCGACGCCGACGCCGACTTCCTGACGTCC 1632 0 
LLARALGIPEIDADADFLTS 



16321 GGCGGCACCAGCATCACCGCGCTGAAGCTGGTCGCCGGCGCCCGCCGGGTCGGCATCCGC 16380 
GGTS I TALKLVAGARRVGI R 



163 81 CTCGAACTCACCACCGTCCTGCGCGAACGCACGGTGCGCCGCATCCTGGCGGCCCAGCCC 1644 0 
LELTTVLRERTVRRILAAQP 

16441 GACGCCGCCTCGCCCCTCGCCGAAGGAGTGCCCGAGTGACCGGTTCCGTAACGCTCACCC 16500 
MTGSVTLTP 
DAASPLAEGVPE* (orf39) 

165 01 CCCTCGGCGGGATCATCCCCAGGCCCCGCGGCGAGGGGCTCACCACCGGCGCCGAGTACG 1656 0 
LGGI I PRPRGEGLTTGAEYD 



16561 ACCTGGGGCCGCTCGGCGACGCGGGCCCCGACTGgGTGCGGGCCCACGGCCCGCGACTGC 16620 
LGPLGDAGPDWVRAHGPRLR 



16621 GCGAGCGCCTCGCCACCGACGGGCTGATCCTGCTGCACGGTCTGCCCACCGACGGAGACG 16680 
ERLATDGLILLHGLPTDGDG 



16681 GCGTCGACGGCTTCCACGACGTCGTCGGCTCCGTCGGCGGCGACCCGCTGCCCTACACCG 16740 
VDGFHDVVGSVGGDPLPYTE 



16741 AGCGCTCCACCCCGCGCAGCGTGGTCAAGGGCAACATCTACACCTCGACCGAGTACCCGG 16800 
RSTPRSVVKGNIYTSTEYPA 



16801 CCGACCAGCCCATCCCGATGCACAACGAGAACTCCTACGCCGCCCATTGGCCGTCCACGC 1686 0 
DQPIPMHNENSYAAHWPSTL 



16861 TCTACTTCTTCTGCCACACCGCGCCGGACACCGGCGGGGCCACGCCGATCGCCGACGGCC 16 920 
YFFCHTAPDTGGATPIADGR 



16921 GCGCCGTCCTCGACCTCATCCCGGCCGAGGTCAGGCGGCGGTTCTCCCAAGGGGTCGTCT 16980 
AVLDLI PAEVRRRFSQGVVY 



16981 ACACCCGTACGTTCCGCGCCGACATGGGACTGAGCTGGCAGGAAGCGTTCCAGACCGAGG 17 040 
TRTFRADMGLSWQEAFQTED 



17041 ACCGCGGCGACGTCGAACGCCATTGCCGCGCCCACGGCCAGGAGTTCTCCTGGGACGGCG 1710 0 
RGDVERHCRAHGQEFSWDGD 



17101 ACGTCCTGCGCACCCGCCACCACCGCCCGGCGACCGCCGTCGACCCCGGCACCGGAGCCG 1716 0 
VLRTRHHRPATAVDPGTGAE 



17161 AGGTGTGGTTCAACCAGGCGCACCTGTTCCACCCGTCCAGCCTGGATCCCGACCTGCGCC 17220 
VWFNQAHLFHPSSLDPDLRQ 



17221 AGGTGCTCCTGGAGACGTACGGCGAGAACGGCCTGCCCCGCGACGCCCTGTTCGCCGACG 17280 
VLLETYGENGLPRDALFADG 



17281 GCACCCCGATCCCCGACGCCGACCTGGCAACGGTCCGCGCGGCCTACACCCGCGCCGCGC 17340 
TPIPDADLATVRAAYTRAAL 
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174 61 GAGCCGTGCCGACGCATCGGCACGCCGTCCTCCCGTCGGGGCGCTACCATCGCCGCTGTC 17 520 

17 521 TCGGCCATCACCCCACCCGGGCGGAGGCAACCGGCCGTGCACATCCCCGCCGTGGTCGCC 17580 

17 581 ACGGCACGCGCGATCACCCGCGCCATGACCGCCCAGCCCGTTGTCACATCTGCGGAGGCG 1764 0 

17 641 CCGCGATGACAGAGGTCCGAGGTGAACTGATCCGGGCGCTGCCGGGTGTGCTGGAGGCGC 17700 
MTEVRGELIRALPGVLEAR 
(orf40) 



EELLGVPLLDGY 
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1 8 S 0 1 AGACCTGCGGCAAGATCACGGTTGAGCGGCTCGGCGGCTCCCGGGAGGGCGGTTGCCGGG 
TCGKITVERLGGSREGGCR 



SEQ ID NO: 3 BLM gene PPTase ORFS 41 



GGATCCTGCGCTACCCGGACTTCGCCCAGTGGTGCGGCACCGAGCTCACCGCCGACTGGCACGTCCGCTTCCGGGCCGCC 
GCCGCGGTCTACGGGCATCTGCACATCCCCCGCGTGACCCGGTACGACGGCGTCCGCTTCGAGGAGGTGTCGGTCGGCTA 
CCCGCGCGAGTGGCGGCCCCGGCCGCCCCGCGAGCCGCTCCGGCAGATCCTGCCCCAGCCCGTCGACGAGCCGGGAGCCC 
TCTGGTGATCGCCGCCCTCCTGCCCTCCTGGGCCGTCACCGAACACGCCTTCACCGACGCCCCGGACGACCCGGTGAGCC 



M I A A 



P S 



EHAFTDAPDD 



4 01 GCCCGCGCCGCCCTCGGCCGGCTGGGCCTCCCGCCCGGTCCGCTGCTGCCCGGCCGACGGGGCGCGCCGAGCTGGCCGGA 



ARAALGRLG 



P G P L L P G 



RGAPSWPD 



CGGGGTGGTGGGGAGCATGACGCACTGTCAGGGCTTCCGGGGCGCCGCGGTCGCCCGGGCCGCCGACGCCGCGTCGCTCG 



S M T H C Q G 



R G A A 



ARAADAAS 



721 GTACCCGCTGACCGGCCTGGAGCTGGACTTCGACGAGGCCGAGCTGGCCGTCGATCCGGACGCCGGGACGTTCACGGCCC 



L D F D E A 



L A V D 



D A G T F T A 



1041 
1121 
1201 



1S81 
1761 



GGCGCCGGCCCGGCGGGCCCTCCGCCGTGCGGAGCGGAGGCCCGGCGCGGACGCGCCCGGTGTCGTCGGATACGTGCGTC 
AGTCGGCGACGCAGACGTTGCCGTTGGTCGAGTTGAGCAGCCCGACGATGTCGATGGTGTTGCCGCAGAGGTTGATGGGG 
ATGTGGACGGGGATCTGGATGACGTTGCCCGAGACGACGCCCGGGGAGCCGACGGCCGCCCCCTTGGCGTTCGAGTCGGC 
GAGGGCGGTGCCGGAGACGCCGGCGAGCGCCGTGCCCACGGTGGCGGTGAGGGCCGCTGCCTTGGCGATTCGTGACATGG 
GGTGACACCTTCGTTCGGTCTGACAGGGTCGAGCTCACGGCCTCTGACGGCCGGGAGCCCGGATCAACGCCCGATCACCC 
CGAAGGTTTCGAATCGTGCGGCGGACGGGTGACCGGCGGCCGAACGGCCTCGCCGGGCCCCCGGAAGGTGCCATGACGTC 
CGTGCGCCATCTGTACAGCCCGGTCCCGCGCCGCGTACAAGGGACGGACGGACGGCCGGTGGACGGACGACCGGCGGGGA 
GGGGAGGCCATGAGCCGGATCGCGATCGTCGGGGCGGGTCAGGCCGGACTGCATCTGGCGCTGGGGCTGCTGGGGGCGGG 
GAGCGGCTCTTCCCGTCACGAGGTGCTGCTCGTGTCCGACGGGACGCCGGACGAGATCCGCGCCGGGCGGGTGCGGTCGA 
C 1761 



56 



